Linq-To-XML Style of Node Creation for C++

Introduction

This article discusses the new C++ Elmax XML Library feature to use Linq-To-XML node creation to write XML files. Currently, there are no plans to implement this feature for C# Elmax. C# users can use .NET Linq-To-XML to achieve the same XML writing. Readers who want to learn more about the Elmax XML library can read "The XML parsing Article That Should (Not) Be Written!" and the documentation (not required to understand this article). The intended audience for this article is XML library authors who may be interested in implementing this Linq-To-XML node creation feature for their XML libraries. Though Linq-To-XML node creation has already been mentioned several times, C++ programmers who work primarily in native C++, may not be familiar with Linq-To-XML node creation syntax, what it does and how it does it. Linq-To-XML node creation, simply said, is the natural way to create nodes with code that is structurally identical to resultant XML. To prove my point, I will show a .NET C# Linq-To-XML node creation code snippet to add movie information to movies element.

using System.Xml.Linq;

XElement movies = new XElement("Movies");

movies.Add(
    new XElement("Movie",
        new XAttribute("Name", "Transformers: Dark of the Moon"),
        new XAttribute("Year", "2011"),
        new XAttribute("RunningTime", 157.ToString()),
        new XElement("Director", "Michael Bay"),
        new XElement("Stars",
            new XElement("Actor", "Shia LaBeouf"),
            new XElement("Actress", "Rosie Huntington-Whiteley")
        ),
        new XElement("DVD",
            new XElement("Price", "25.00"),
            new XElement("Discount", (0.1).ToString())
        ),
        new XElement("BluRay",
            new XElement("Price", "36.00"),
            new XElement("Discount", (0.1).ToString())
        )
    )
);

XDocument doc = new XDocument(
    new XDeclaration("1.0", "utf-8", ""),
    movies);

doc.Save(@"C:TempMovies1.xml");

For the reader’s information, the Visual Studio IDE will automatically indent your Linq-To-XML node creation code for you when you hit the enter key. The Movies1.xml output looks similar to what is displayed right below.

<?xml version="1.0" encoding="utf-8"?>
<Movies>
    <Movie Name="Transformers: Dark of the Moon" Year="2011"
        RunningTime="157">
        <Director>Michael Bay</Director>
        <Stars>
            <Actor>Shia LaBeouf</Actor>
            <Actress>Rosie Huntington-Whiteley</Actress>
        </Stars>
        <DVD>
            <Price>25.00</Price>
            <Discount>0.1</Discount>
        </DVD>
        <BluRay>
            <Price>36.00</Price>
            <Discount>0.1</Discount>
        </BluRay>
    </Movie>
</Movies>

It is not difficult to visualize how the XML would look from the C# code. In the next section, we shall compare the new Linq-To-XML and the original Elmax node creation.

Comparison of the New Linq-To-XML and the Old Elmax Node Creation

I guess by now, readers are eager to see the Linq-To-XML syntax for C++. Without further delay, the code is displayed below.

using namespace Elmax;

NewElement movies(L"Movies");

movies.Add(
    NewElement(L"Movie",
        NewAttribute(L"Name", L"Transformers: Dark of the Moon"),
        NewAttribute(L"Year", L"2011"),
        NewAttribute(L"RunningTime", ToStr(157)),
        NewElement(L"Director", L"Michael Bay"),
        NewElement(L"Stars",
            NewElement(L"Actor", L"Shia LaBeouf"),
            NewElement(L"Actress", L"Rosie Huntington-Whiteley")
        ),
        NewElement(L"DVD",
            NewElement(L"Price", L"25.00"),
            NewElement(L"Discount", ToStr(0.1))
        ),
        NewElement(L"BluRay",
            NewElement(L"Price", L"36.00"),
            NewElement(L"Discount", ToStr(0.1))
        )
    )
);

movies.Save(L"C:\Temp\Movies2.xml", L"1.0", true);

As the reader may notice, the C++ syntax does not allocate the elements on the heap using the new keyword, unlike the C# version; in other words, the elements are allocated on the stack. C# Linq-To-XML allocates the elements on the heap, which needs to be garbage-collected by the garbage-collector, which hurts performance and requires more memory. For elements allocated on the stack, we do not have this massive memory consumption problem because they are popped off the stack immediately when the element goes out of scope.

Underneath the surface, the memory is still allocated on the heap to construct the internal tree structure. Then the internal tree structure is converted to MS XML DOM elements recursively in the Save method. Just before the Save method returns, the internal tree structure is destroyed. If user wants to retain the tree structure for either, another Save call or append the tree structure to a larger tree structure, he/she might not want to destroy the tree structure during Save; he/she can specify false for discard argument (default value is true) in the Save method.

bool Save(MSXML2::IXMLDOMDocumentPtr& ptrDoc,
    const std::wstring& file,
    const std::wstring& xmlVersion,
    bool utf8,
    bool discard = true);

bool PrettySave(
    const std::wstring& file,
    const std::wstring& xmlVersion,
    bool utf8,
    const std::wstring& indent = L"    ",
    bool discard = true);

By now, the reader may be curious to know how the original Elmax node creation stacks up against the new Linq-To-XML node creation syntax. The example below shows how to save the same Movies2.xml, using original Elmax code.

MSXML2::IXMLDOMDocumentPtr pDoc;
HRESULT hr = CreateAndInitDom(pDoc);
if (SUCCEEDED(hr))
{
    using namespace Elmax;
    Element root;
    root.SetConverter(NORMAL_CONV);
    root.SetDomDoc(pDoc);

    Element movies = root[L"Movies"];
    Element movie = movies[L"Movie"].CreateNew();
    movie.Attribute(L"Name") = L"Transformers: Dark of the Moon";
    movie.Attribute(L"Year") = L"2011";
    movie.Attribute(L"RunningTime") = 157;
    movie[L"Director"] = L"Michael Bay";
    movie[L"Stars|Actor"] = L"Shia LaBeouf";
    movie[L"Stars|Actress"] = L"Rosie Huntington-Whiteley";
    movie[L"DVD|Price"] = L"25.00";
    movie[L"DVD|Discount"] = 0.1;
    movie[L"BluRay|Price"] = L"36.00";
    movie[L"BluRay|Discount"] = 0.1;

    SaveXml(pDoc, L"C:\Temp\Movies3.xml");
}

As the reader can see, it can be hard to discern the structure of the XML just by casually glancing at the original Elmax code of node creation.

How the Library is Written

Surprisingly, the Linq-To-XML node creation library code is very simple and can be written in under a couple of hours. To create nodes using the new syntax we are required to use NewElement, NewAttribute, NewCData and NewComment class. These new classes are derived from NewNode class and they do most of their useful work in their constructors.

This is the code listing for the declaration of NewElement class.

class NewElement : public NewNode
{
public:
    // Destructor
    ~NewElement(void);

    NewElement operator[](LPCWSTR name);
    NewElement operator[](LPCSTR name);

    bool Exists() { return GetPtr()!=NULL; }

    //! Copy constructor
    NewElement(const NewElement& other);
    //! Assignment operator
    NewElement& operator=(const NewElement& other);

    // Constructors
    NewElement();
    NewElement(const std::wstring& name);
    NewElement(const std::wstring& name,
        const std::wstring& sValue);
    NewElement(const std::wstring& name, NewNode& node1);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3,
        NewNode& node4);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3,
        NewNode& node4, NewNode& node5);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3,
        NewNode& node4, NewNode& node5,
        NewNode& node6);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3,
        NewNode& node4, NewNode& node5,
        NewNode& node6, NewNode& node7);
    NewElement(const std::wstring& name, NewNode& node1,
        NewNode& node2, NewNode& node3,
        NewNode& node4, NewNode& node5,
        NewNode& node6, NewNode& node7,
        NewNode& node8);

    // ... other overloaded constructors up to 16 NewNode parameters
    // are not shown for simplicity

    NewElement Add(NewNode& node1);
    NewElement Add(NewNode& node1, NewNode& node2);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3, NewNode& node4);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3, NewNode& node4,
        NewNode& node5);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3, NewNode& node4,
        NewNode& node5, NewNode& node6);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3, NewNode& node4,
        NewNode& node5, NewNode& node6,
        NewNode& node7);
    NewElement Add(NewNode& node1, NewNode& node2,
        NewNode& node3, NewNode& node4,
        NewNode& node5, NewNode& node6,
        NewNode& node7, NewNode& node8);

    // ... other overloaded Add methods up to 16 NewNode parameters
    // are not shown for simplicity

    bool Save(MSXML2::IXMLDOMDocumentPtr& ptrDoc,
        const std::wstring& file, bool discard = true);
    bool PrettySave(MSXML2::IXMLDOMDocumentPtr& ptrDoc,
        const std::wstring& file, bool discard = true);
    bool Append(NewTreeNode* child);

private:

    NewElement Find(const std::wstring& names);
    NewElement FindFirstChild(const std::wstring& name);
};

The code listing of the overloaded constructor, which takes in 8 NewNode parameters is listed here.

NewElement::NewElement(const std::wstring& name,
    NewNode& node1, NewNode& node2,
    NewNode& node3, NewNode& node4,
    NewNode& node5, NewNode& node6,
    NewNode& node7, NewNode& node8)
{
    Init();
    NewTreeNode* ptr = GetPtr();
    if(ptr)
    {
        ptr->xmltype = XML_ELEMENT;
        ptr->pName = name;

        NewTreeNode* tmpPtr = node1.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node2.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node3.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node4.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node5.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node6.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node7.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node8.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
    }
}

The code listing of the overloaded Add method with 8 NewNode parameters is listed here.

NewElement NewElement::Add(
    NewNode& node1, NewNode& node2,
    NewNode& node3, NewNode& node4,
    NewNode& node5, NewNode& node6,
    NewNode& node7, NewNode& node8)
{
    NewTreeNode* ptr = GetPtr();
    if(ptr)
    {
        NewTreeNode* tmpPtr = node1.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node2.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node3.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node4.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node5.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node6.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node7.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
        tmpPtr = node8.GetPtr();
        if(tmpPtr!=NULL)
            Append(tmpPtr);
    }
    return *this;
}

As you can see, NewElement constructors and its Add methods do nothing except append the nodes to the vector. Below is the code listing for the declaration of NewAttribute class and definition of its only constructor.

class NewAttribute : public NewNode
{
public:
    // Constructor
    NewAttribute(const std::wstring& name,
        const std::wstring& sValue);
    // Destructor
    ~NewAttribute(void);
};

NewAttribute::NewAttribute(const std::wstring& name,
    const std::wstring& sValue)
{
    Init();
    NewTreeNode* ptr = GetPtr();
    if(ptr)
    {
        ptr->xmltype = XML_ATTRIBUTE;
        ptr->pName = name;
        ptr->pValue = sValue;
    }
}

This is the code listing for the declaration of NewCData class and definition of its only method: its constructor.

class NewCData : public NewNode
{
public:
    // Constructor
    NewCData(const std::wstring& sValue);
    // Destructor
    ~NewCData(void);
};

NewCData::NewCData(const std::wstring& sValue)
{
    Init();
    NewTreeNode* ptr = GetPtr();
    if(ptr)
    {
        ptr->xmltype = XML_CDATA;
        ptr->pValue = sValue;
    }
}

This is the code listing for the declaration of NewComment class and definition of its constructor.

class NewComment : public NewNode
{
public:
    // Constructor
    NewComment(const std::wstring& sValue);
    // Destructor
    ~NewComment(void);
};

NewComment::NewComment(const std::wstring& sValue)
{
    Init();
    NewTreeNode* ptr = GetPtr();
    if(ptr)
    {
        ptr->xmltype = XML_COMMENT;
        ptr->pValue = sValue;
    }
}

The reader may ask the author why he chose to create new classes to do this, instead of modifying the old classes like Element, Attribute, CData and Comment. The reason is because these original classes contain many data members; to construct these classes excessively on the stack and pop them out of the stack would seriously hurt performance. As you can see from the above listing for new classes, I did not list their data member. That’s because their only data member is ptr, which exists in their base class, NewNode.

class NewNode
{
public:
    NewNode(void);
    ~NewNode(void);

    NewTreeNode* GetPtr() const {return ptr;}
    void SetPtr(NewTreeNode* src) { ptr = src; }

    void Init();

    void Discard();
private:
    NewTreeNode* ptr;
};

ptr is of type NewTreeNode. I had intended to name this tree structure, TreeNode but TreeNode is a reserved keyword in Visual C++ 10 because there is another TreeNode class defined in Visual C++ libraries.

enum XMLTYPE
{
    XML_NONE,
    XML_ELEMENT,
    XML_ATTRIBUTE,
    XML_COMMENT,
    XML_CDATA
};

class NewTreeNode
{
public:
    NewTreeNode(void);
    ~NewTreeNode(void);

    std::vector<NewTreeNode*> vec;

    std::wstring pName;
    std::wstring pValue;

    XMLTYPE xmltype;

    static bool Traverse(MSXML2::IXMLDOMDocumentPtr& ptrDoc,
        MSXML2::IXMLDOMNodePtr& parent, NewTreeNode* pNode);
    void Delete();
};

NewTreeNode has Traverse method, which creates the MS XML DOM element as it traverses the tree recursively and it also has a Delete method, which deletes the tree structure recursively. You see, to allocate and deallocate NewNode/NewElement objects on the stack, it is only a matter of pushing and popping 64bit/32bit pointers. Compare this in contrast to pushing and poping the heavy-duty Element class, which contains these many data members below. For reader information, though the 64bit/32bit pointer is popped whenever NewNode object goes out of scope, the tree data that the pointer is pointed to still lives on until they are saved to a file on disk.

class Element
{
private:
    //! type converter pointer
    BaseConverter* m_pIConverter;
    //! for returning wide raw array
    std::wstring m_strTemp;
    //! for returning narrow raw array
    std::string m_asciiStrTemp;
    //! Delimited string of non existing parent
    std::wstring m_strNonExistingParent;
    //! MS XML document object
    MSXML2::IXMLDOMDocumentPtr m_ptrDoc;
    //! MS XML node object
    MSXML2::IXMLDOMNodePtr m_ptrNode;
    //! Stores the deleted state
    bool m_bDeleted;
    //! Node name
    std::wstring m_strName;
    //! Stores the valid state
    bool m_bValid;
    //! State this node is root
    //! (is true if node 1st set with SetDocDom()
    bool m_bRoot;
};

The source code listing of the recursive methods of Traverse and Delete is provided for the reader’s perusal.

bool NewElement::Traverse(NewTreeNode& node, CUnicodeFile& uf, bool utf8)
{
    if(node.xmltype==XML_ELEMENT)
    {
        WriteStartElement(uf, utf8, node.pName);

        bool attrWritten = false;
        for(size_t i=0;i<node.vec.size(); ++i)
        {
            NewTreeNode* node1 = node.vec[i];
            if(node1->xmltype==XML_ATTRIBUTE)
            {
                std::wstring str = L" ";
                str += node1->pName + L"="";
                str += EscapeXML(node1->pValue);
                str += L""";
                Write(uf, utf8, str);

                continue;
            }
            else
            {
                if(attrWritten == false)
                {
                    Write(uf, utf8, L">");
                    attrWritten = true;
                }
            }
            Traverse(*node1, uf, utf8);
        }

        if(node.vec.size()==0)
            Write(uf, utf8, L">");

        if(node.pValue.empty()==false)
        {
            std::wstring str = EscapeXML(node.pValue);
            Write(uf, utf8, str);
        }

        WriteEndElement(uf, utf8, node.pName);
    }
    else if(node.xmltype==XML_COMMENT)
    {
        std::wstring str = L"<!--";
        str += node.pValue;
        str += L"-->";
        Write(uf, utf8, str);
    }
    else if(node.xmltype==XML_CDATA)
    {
        std::wstring str = L"<![CDATA[";
        str += node.pValue;
        str += L"]]>";
        Write(uf, utf8, str);
    }

    return true;
}
void NewTreeNode::Delete()
{
    for(size_t i=0;i<vec.size();++i)
        vec.at(i)->Delete();

    vec.clear();
    delete this;
}

More by Author

Must Read