The XML parsing Article That Should (Not) Be Written!

Introduction

Over the years in my profession as a C++ software developer, I have to infrequently maintain XML file format for some application project files. I found the DOM to be difficult to navigate and use. I have come across many articles and XML libraries which proffer to be easy to use, but none is as easy as the internal XML library co-developed by my ex-coworkers, Srikumar Karaikudi Subramanian and Ali Akber Saifee. Srikumar wrote the 1st version which could only read from XML file and Ali later added the node creation capability which allowed the content to be saved in XML file. However, that library is proprietary. After I left the company, I lost the use of an really-easy-to-use XML library. Unlike many talented programmers out there, I am an idiot; I need an idiot-proof XML library. Too bad, Linq-to-XML (Xinq) is not available in C++/CLI! I decided to re-construct Srikumar's and Ali's XML library and made it open-source! I dedicate this article to Srikumar Karaikudi Subramanian and Ali Akber Saifee.

My terrible relationship with Ali Akber Saifee

Ali Akber Saifee and I are what we called "the world's greatest arch-rivals". While we worked together in the same company, I would always find every opportunity find 'flaws' with Ali and email him to expose some of his 'problems' and carbon-copy everyone else. My arch-rival, as always, beat me with some of his best replies. Ali has once offered me a chance for us to make good and work together to conquer the world together. But I rejected his offer (in thinly-veiled plot) to subservient me! The world's greatest arch-rivals can never work together!

Whenever I lost a friend on facebook, I always check if it was Ali who defriended me. The readers may ask why. Do you, the readers, know the ramifications of the world's greatest arch-rivals defriend each other on facebook? Ans: there can never be world peace! The readers may ask why the world's greatest arch-rivals are on each other's facebook in the 1st place! Well, that is another story for another article in another day!

Why am I rewriting and promoting my arch-rival's XML library? Before Ali says this, let me pre-empt him and say this myself: Imitation is the most sincere form of flattery. The truth is his XML library is really easy to use!

Some code examples first

<Books>
  <Book>
    <Price>12.990000</Price>
  </Book>
</Books>

To create the above XML, see the C++ code below,

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"] = 12.99f;

The 3rd line of code detects that the 3 elements do not exist and the float assignment will attempt to create those 3 elements and convert 12.99f to string and assign to the price element. To read the price element, we just assign it to the float variable (see below),

Elmax::Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice;

It is good practice to check if the price element exists, using Exists(), before reading it.

XML versus binary serialization

In this section, let us look first at the advantages of XML over binary serialization before we discuss Elmax. I'll not discuss XML serialization because I am not familiar with it. Below is the simplified (version 1) file format for a online bookstore.

Version=1
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
Authors
  Author*
    Name
    AuthorID

The child elements are indented under the parent. The elements which can be more than 1 in quantity, are appended with a asterisk(*). The diagram below shows what the (version 1) binary serialization file format will typically look like.

Binary Version 1
Figure 1

Let's say in the version 2, we add a Description under the Book and a Biography under the Author.

Version=2
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
    Description(new)
Authors
  Author*
    Name
    AuthorID
    Biography(new)

The diagram below shows the version 1 and 2 binary serialization file format. The new additions in version 2 is in lighter colors.

Version 2
Figure 2

Notice the version 1 and 2 are binary incompatible? Below is how binary (note: not binary serialization) file format would choose to implement it.

Version=2
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
Authors
  Author*
    Name
    AuthorID
Description(new)*
Biography(new)*

Binary Version 2
Figure 3

In this way, version 1 of the application still can read the version 2 binary file while ignoring the new additional parts at the back of the file. If XML is used and without doing any work, version 1 of the application still can read the version 2 XML file (forward compatible) while ignoring the new additional elements, provided that the data type of the original elements remains unchanged and not removed. And version 2 application can read version 1 XML file by using the old parsing code (backward compatible). The downside to XML parsing is it is slower than binary file format and takes up more space but XML file are self-describing.

XML Version 2
Figure 4

Below is an example of how I would implement the file format in XML, which is followed by an code example to create the XML file.

<?xml version="1.0" encoding="UTF-8"?>
<All>
  <Version>1</Version>
  <Books>
    <Book ISBN="1111-1111-1111">
      <Title>How not to program!</Title>
      <Price>12.990000</Price>
      <Desc>Learn how not to program from the industry's 
worst programmers! Contains lots of code examples which 
programmers should avoid! Treat it as inverse education.</Desc>
      <AuthorID>111</AuthorID>
    </Book>
    <Book ISBN="2222-2222-2222">
      <Title>Caught with my pants down</Title>
      <Price>10.000000</Price>
      <Desc>Novel about extra-martial affairs</Desc>
      <AuthorID>111</AuthorID>
    </Book>
  </Books>
  <Authors>
    <Author Name="Wong Shao Voon" AuthorID="111">
      <Bio>World's most funny author!</Bio>
    </Author>
  </Authors>
</All>

#import <msxml6.dll>
using namespace MSXML2; 

HRESULT CTryoutDlg::CreateAndInitDom(
    MSXML2::IXMLDOMDocumentPtr& pDoc)
{
    HRESULT hr = pDoc.CreateInstance(__uuidof(MSXML2::DOMDocument30));
    if (SUCCEEDED(hr))
    {
        // these methods should not fail so don't inspect result
        pDoc->async = VARIANT_FALSE;
        pDoc->validateOnParse = VARIANT_FALSE;
        pDoc->resolveExternals = VARIANT_FALSE;
        MSXML2::IXMLDOMProcessingInstructionPtr pi = 
            pDoc->createProcessingInstruction
                (L"xml", L" version='1.0' encoding='UTF-8'");
        pDoc->appendChild(pi);
    }
    return hr;
}

bool CTryoutDlg::SaveXml(
    MSXML2::IXMLDOMDocumentPtr& pDoc, 
    const std::wstring& strFilename)
{
    TCHAR szPath[MAX_PATH];

    if(SUCCEEDED(SHGetFolderPath(NULL, 
        CSIDL_LOCAL_APPDATA|CSIDL_FLAG_CREATE, 
        NULL, 
        0, 
        szPath))) 
    {
        PathAppend(szPath, strFilename.c_str());
    }

    variant_t varFile(szPath);
    return SUCCEEDED(pDoc->save(varFile));
}

void CTryoutDlg::TestWrite()
{
    MSXML2::IXMLDOMDocumentPtr pDoc;
    HRESULT hr = CreateAndInitDom(pDoc);
    if (SUCCEEDED(hr))
    {
        using namespace Elmax;
        using namespace std;
        Element root;
        root.SetConverter(NORMAL_CONV);
        root.SetDomDoc(pDoc);

        Element all = root[L"All"];
        all[L"Version"] = 1;
        Element books = all[L"Books"].CreateNew();
        Element book1 = books[L"Book"].CreateNew();
        book1.Attribute(L"ISBN") = L"1111-1111-1111";
        book1[L"Title"] = L"How not to program!";
        book1[L"Price"] = 12.99f;
        book1[L"Desc"] = L"Learn how not to program from the 
industry's worst programmers! Contains lots of code examples 
which programmers should avoid! Treat it as inverse education.";
        book1[L"AuthorID"] = 111;

        Element book2 = books[L"Book"].CreateNew();
        book2.Attribute(L"ISBN") = L"2222-2222-2222";
        book2[L"Title"] = L"Caught with my pants down";
        book2[L"Price"] = 10.00f;
        book2[L"Desc"] = L"Novel about extra-martial affairs";
        book2[L"AuthorID"] = 111;

        Element authors = all[L"Authors"].CreateNew();
        Element author = authors[L"Author"].CreateNew();
        author.Attribute(L"Name") = L"Wong Shao Voon";
        author.Attribute(L"AuthorID") = 111;
        author[L"Bio"] = L"World's most funny author!";

        std::wstring strFilename = L"Books.xml";
        SaveXml(pDoc, strFilename);
    }
}

Here is the code to read the XML which is saved in the previous code snippet. Some helper class (DebugPrint) and methods (CreateAndLoadXml and DeleteFile) are omitted to focus on the relevant code. The helper class and methods can be found in the Tryout project in the source code download.

void CTryoutDlg::TestRead()
{
    DebugPrint dp;
    MSXML2::IXMLDOMDocumentPtr pDoc;
    std::wstring strFilename = L"Books.xml";
    HRESULT hr = CreateAndLoadXml(pDoc, strFilename);
    if (SUCCEEDED(hr))
    {
        using namespace Elmax;
        using namespace std;
        Element root;
        root.SetConverter(NORMAL_CONV);
        root.SetDomDoc(pDoc);

        Element all = root[L"All"];
        if(all.Exists()==false)
        {
            dp.Print(L"Error: root does not exists!");
            return;
        }
        dp.Print(L"Version : {0}\n\n", all[L"Version"].GetInt32(0));

        dp.Print(L"Books\n");
        dp.Print(L"=====\n");
        Element books = all[L"Books"];
        if(books.Exists())
        {
            Element::collection_t vecBooks = 
                books.GetCollection(L"Book");
            for(size_t i=0; i<vecBooks.size(); ++i)
            {
                dp.Print(L"ISBN: {0}\n", 
                    vecBooks[i].Attribute(L"ISBN").GetString(L"Error"));
                dp.Print(L"Title: {0}\n", 
                    vecBooks[i][L"Title"].GetString(L"Error"));
                dp.Print(L"Price: {0}\n", 
                    vecBooks[i][L"Price"].GetFloat(0.0f));
                dp.Print(L"Desc: {0}\n", 
                    vecBooks[i][L"Desc"].GetString(L"Error"));
                dp.Print(L"AuthorID: {0}\n\n", 
                    vecBooks[i][L"AuthorID"].GetInt32(-1));
            }
        }

        dp.Print(L"Authors\n");
        dp.Print(L"=======\n");
        Element authors = all[L"Authors"];
        if(authors.Exists())
        {
            Element::collection_t vecAuthors = 
                authors.GetCollection(L"Author");
            for(size_t i=0; i<vecAuthors.size(); ++i)
            {
                dp.Print(L"Name: {0}\n", 
                    vecAuthors[i].Attribute(L"Name")
                        .GetString(L"Error"));
                dp.Print(L"AuthorID: {0}\n", 
                    vecAuthors[i].Attribute(L"AuthorID").GetInt32(-1));
				dp.Print(L"Bio: {0}\n\n", 
                    vecAuthors[i][L"Bio"].GetString(L"Error: No bio!"));
            }
        }
    }
    DeleteFile(strFilename);
}

This is the output after the XML is read.

Version : 1

Books
=====
ISBN: 1111-1111-1111
Title: How not to program
Price: 12.990000
Desc: Learn how not to program from the industry's worst programmers! Contains lots of code examples which programmers should avoid! Treat it as reverse education.
AuthorID: 11

ISBN: 2222-2222-2222
Title: Caught with my pants down
Price: 10.000000
Desc: Novel about extra-martial affairs AuthorID: 111 Authors ======= Name: Wong Shao Voon AuthorID: 111 Bio: World's most funny author!

The XML parsing Article That Should (Not) Be Written!

Library usage

In the section, we'll look at how to use Elmax library to perform creation, reading, update and deletion (CRUD) on elements, attributes, CData sections and comments. As you can see from the previous code sample that Elmax makes use of Microsoft XML DOM library. That's because I do not wish to re-create all that XML functionality, for instance, XPath. Since Elmax depends on Microsoft XML which in turn depends on COM to work, we have to call CoInitialize(NULL); to initialize COM runtime at the start of the application and also call CoUninitialize(); to uninitialize it before the application ends. Elmax is an abstraction over DOM, however, it does not seek to replicate all the functionality of DOM. For example, programmer cannot use Elmax to read element siblings. In Elmax model, element is 1st class citizen. Attribute, CData section and comment are children of a element! This is different from the DOM where they are nodes in their own right. The reason I designed CData section and comment to be children of element, is because CData section and comment are not identifiable by name or ID.

Element creation

Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();

Typically, we use CreateNew to create elements. There is also a Create method. The difference is the Create method will not create the elements if they already exist. Notice that I did not use Create or CreateNew to create All and Version elements? That's because they are created automatically when I assign a value to the last element on the chain. Note that when you call CreateNew repeatedly, only the last element gets created. Let me show you an code example to explain this.

root[L"aa"][L"bb"][L"cc"].CreateNew();
root[L"aa"][L"bb"][L"cc"].CreateNew();
root[L"aa"][L"bb"][L"cc"].CreateNew();

In the 1st CreateNew call, elements "aa", "bb" and "cc" are created. In each subsequent call, only element cc is created. This is the resultant XML created (and indented for easy reading).

<aa>
  <bb>
    <cc/>
    <cc/>
    <cc/>
  </bb>
</aa>

Create and CreateNew has an optional parameter to specify the namespace URI. If your element belongs to a namespace, then you must create it explicitly, using Create or CreateNew; it means you cannot rely on value assignment to create it automatically. More on this later. Note: calling instance Element methods other than Create, CreateNew, setters and accessors when the element(s) do not exists, Elmax will raise an exception! When do we use Create instead of CreateNew? One possible scenario is the application load a XML file, edits it and saves it. In the edit stage, it is not check if a element exists in the original XML file before assigning it or adding nodes: Call Create which will create it if not exists, otherwise Create does nothing.

Element Removal

using namespace Elmax;
Element elem;
Element elemChild = elem[L"Child"];
// do processing
elem.RemoveNode(elemChild); // Remove its child node.
elem.RemoveNode(); // Remove itself from DOM.

Note: for AddNode method, you can only add node which has been removed in the current version.

Element Value Assignment

In the begining of the article, I showed how to create elements and assign a value to the last element at the same time. I'll repeat that code snippet here.

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"] = 12.99f;

It turns out that this example is dangerous as it use overloaded assignment operator determined by the compiler. What if you mean to assign a float but assign a integer instead just because you forgot to add a ".0" and append a 'f' to the float value? Not much harm in this case, I suppose. In all scenarios, it is better to use the setter method to assign value explicitly.

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"].SetFloat(12.99f);

Here is the list of setter methods available.

bool SetBool(bool val);
bool SetChar(char val);
bool SetShort(short val);
bool SetInt32(int val);
bool SetInt64(__int64 val);
bool SetUChar(unsigned char val);
bool SetUShort(unsigned short val);
bool SetUInt32(unsigned int val);
bool SetUInt64(unsigned __int64 val);
bool SetFloat(float val);
bool SetDouble(double val);
bool SetString(const std::wstring& val);
bool SetString(const std::string& val);
bool SetGUID(const GUID& val);
bool SetDate(const Elmax::Date& val);
bool SetDateTime(const Elmax::DateAndTime& val);

Element Value Reading

In the beginning of the article, I showed how to read a value from element. I'll repeat the code snippet here.

Elmax::Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice;

This is the more correct version, using the GetFloat accessor to specify a default value.

Elmax::Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice.GetFloat(10.0f);

Price will get a default value of 10.0f if the value does not exist or is invalid whereas the prior example before this example, will get a 0.0f because default value is not specified. But by default, Elmax does not know the string value is a improper float value in textual form, unless you use regular expression to validate the string value. Set REGEX_CONV instead of NORMAL_CONV in the root element to use regular expression type converter. As an alternative, you can use schema or DTD to validate your XML before doing Elmax parsing. To learn schema or DTD validation, please consult your favorite MSDN.

Elmax::Element root;
root.SetConverter(REGEX_CONV);

This is the declaration of SetConverter method.

//! Set the type converter pointer
void SetConverter(CONVERTER conv, IConverter* pConv=NULL);

To use your own custom type converter, set the optional pConv pointer.

Elmax::Element root;
root.SetConverter(CUSTOM_CONV, pCustomTypeConv);

You are reponsible for the deletion of pCustomTypeConv if it is allocated on heap. There are locale type converters in Elmax but they are not tested at this point because I am not sure how to test them, as in Asia, number representation are the same in different countries, unlike in Europe. As a tip to the readers who might be modifying Elmax, remember to run through all the 252 unit tests to make sure you did not break anything after modification. The unit test is only available for run in Visual Studio 2010. Below is a list of value accessors available.

bool GetBool(bool defaultVal) const;
char GetChar(char defaultVal) const;
short GetShort(short defaultVal) const;
int GetInt32(int defaultVal) const;
__int64 GetInt64(__int64 defaultVal) const;
unsigned char GetUChar(unsigned char defaultVal) const;
unsigned short GetUShort(unsigned short defaultVal) const;
unsigned int GetUInt32(unsigned int defaultVal) const;
unsigned __int64 GetUInt64(unsigned __int64 defaultVal) const;
float GetFloat(float defaultVal) const;
double GetDouble(double defaultVal) const;
std::wstring GetString(const std::wstring& defaultVal) const;
std::string GetString(const std::string& defaultVal) const;
GUID GetGUID(const GUID& defaultVal) const;
Elmax::Date GetDate(const Elmax::Date& defaultVal) const;
Elmax::DateAndTime GetDateTime(
    const Elmax::DateAndTime& defaultVal) const;

For GetBool and the interpretation of boolean value, "true", "yes", "ok" and "1" evaluate to be true while "false", "no", "cancel" and "0" evaluate to be false. They are not case-sensitive.

Namespace

To create a element under a namespace URI, "http://www.yahoo.com", see below,

using namespace Elmax;
Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();
Element book1 = books[L"Book"].CreateNew(L"http://www.yahoo.com");

The XML output is as below,

<?xml version="1.0" encoding="UTF-8"?>
<All>
  <Version>1</Version>
  <Books>
    <Book xmlns="http://www.yahoo.com"/>
  </Books>
</All>

To create a bunch of elements and attribute under a namespace URI, see below,

using namespace Elmax;
Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();
Element book1 = books[L"Yahoo:Book"].CreateNew(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN").Create(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN") = L"1111-1111-1111";
book1[L"Yahoo:Title"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Title"] = L"How not to program!";
book1[L"Yahoo:Price"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Price"] = 12.99f;
book1[L"Yahoo:Desc"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Desc"] = L"Learn how not to program from the industry's
    worst programmers! Treat it as inverse education.";
book1[L"Yahoo:AuthorID"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:AuthorID"] = 111;

The XML output is as below,

<All>
  <Version>1</Version>
  <Books>
    <Yahoo:Book xmlns:Yahoo="http://www.yahoo.com"
        Yahoo:ISBN="1111-1111-1111">
      <Yahoo:Title>How not to program!</Yahoo:Title>
      <Yahoo:Price>12.990000</Yahoo:Price>
      <Yahoo:Desc>Learn how not to program from the industry's
        worst programmers! Treat it as inverse education.</Yahoo:Desc>
      <Yahoo:AuthorID>111</Yahoo:AuthorID>
    </Yahoo:Book>
  </Books>
</All>

Enumerating same elements

You can use the AsCollection method to get siblings with the same name in a vector.

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);

Element::collection_t vec = root[L"aa"][L"bb"][L"cc"].AsCollection();

for(size_t i=0;i<vec.size(); ++i)
{
    int n = vec.at(i).GetInt32(10);
}

This overloaded form (below) of AsCollection is faster as it does not create a temporary vector before returning.

bool AsCollection(const std::wstring& name, collection_t& vec);

Enumerating same child elements

You can use the GetCollection method to get children with the same name in a vector.

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);

Element::collection_t vec = root[L"aa"][L"bb"].GetCollection(L"cc");

for(size_t i=0;i<vec.size(); ++i)
{
    int n = vec.at(i).GetInt32(10);
}

This overloaded form (below) of GetCollection is faster as it does not create a temporary vector before returning.

bool GetCollection(const std::wstring& name, collection_t& vec);

Query number of children

To query the number of children for each name, you can use QueryChildrenNum method.

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|qq"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);
Element elem4 = root[L"aa|bb|qq"].CreateNew();
elem4.SetInt32(44);
Element elem5 = root[L"aa|bb|cc"].CreateNew();
elem5.SetInt32(55);

Element::available_child_t acmap = 
    root[L"aa"][L"bb"].QueryChildrenNum();

assert(acmap[L"cc"] == (unsigned int)(3));
assert(acmap[L"qq"] == (unsigned int)(2));

There is also an overloaded form (below) of QueryChildrenNum which does not create a temporary vector before returning. Note: QueryChildrenNum can only query for elements, not attributes or CData sections or comments.

typedef std::map< std::wstring, size_t > available_child_t;
bool QueryChildrenNum(available_child_t& children);

Shortcut to avoid temporary element creation

In the previous enumeration example, I used

Elmax::Element elem1 = root[L"aa|bb|cc"].CreateNew();

instead of

Elmax::Element elem1 = root[L"aa"][L"bb"][L"cc"].CreateNew();

because the 2nd form creates temporary elements, "aa" and "bb" on the stack which are not used. The 1st form saves some tedious typing and only returns 1 element in the overloaded [] operator, not to say it is faster too. '\\' and '/' can be used for delimiters as well. To speed up the below code which excessively use temporaries,

if(root[L"aa"][L"bb"][L"cc"][L"dd"][L"ee"].Exists())
{
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"ee"][L"Title"] = L"Beer jokes";
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"ee"][L"Author"] = L"The joker";
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"ee"][L"Price"] = 10.0f;
}

you can assign it to a Element variable, and use that variable instead.

Elmax::Element elem1 = root[L"aa|bb|cc|dd|ee"];
if(elem1.Exists())
{
    elem1[L"Title"] = L"Beer jokes";
    elem1[L"Author"] = L"The joker";
    elem1[L"Price"] = 10.0f;
}

Root element

Root element is created when you call SetDomDoc on the element. You should know, by now, that the [] operator is used to access the child element. For root element, the [] operator accesses itself to see it's name correspond to the name in the [] operator.

Element root;
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"];

The "aa" element in the above example actually refers to the root, not the child of root. If a element is not called with SetDomDoc(), then "aa" refers to its child. When using the [] operator, please remember to prefix the (wide) string literal with 'L', eg, elem[L"Hello"] else you will get a strange unhelpful error. Elements are created directly oir indirectly from the root. For example, root create the "aa" element and the "aa" element has the ability to create other elements. If you instantiate your element not from the root, your element cannot create. This is the limitation of the MS XML DOM which only the DOM document create nodes. Those Elements which created directly or indirectly from root, received their DOM document, thus the ability to create Elements.

Shared State in Multithreading

You might be using different Elmax Element objects in different threads without sharing them across threads. However, Element has static type converter objects which are shared with all Element objects. To overcome this problem, allocate a new type converter and use that in the root. Remember to delete the converter after use.

using namespace Elmax;
Element root;
root.SetDomDoc(pDoc);
RegexConverter* pRegex = new RegexConverter();
root.SetConverter(CUSTOM_CONV, pRegex);

By the way, remember to call CoInitialize/CoUninitialize in your worker threads!

Save File Contents in XML

You can call SetFileContents to save a file's binary contents in Base64 format in the Element. You can specify to save its file name and file length in the attributes if you intended to save the contents back to a file with a same name on disk. We need to save the original file length as well because GetFileContents sometimes reported a longer length after the Base64 conversion!

bool SetFileContents(const std::wstring& filepath, 
    bool bSaveFilename, 
    bool bSaveFileLength);

We use GetFileContents to get back the file content from Base64 conversion. filename is written, provided that you did specify to save the file name during SetFileContents. length is the length of the returned character array, not the saved file length attribute.

char* GetFileContents(std::wstring& filename, size_t& length);

Attribute

To create attribute (if not exists) and assign a string to it, see example below.

book1.Attribute(L"ISBN") = L"1111-1111-1111";

To create attribute with a namespace URI and assign a string to it, you have to create it explicitly.

book1.Attribute(L"Yahoo:ISBN").Create(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN") = L"1111-1111-1111";

To delete an attribute, use Delete method.

book1.Attribute(L"ISBN").Delete();

To find out a attribute with the name exists, use Exists method.

bool bExists = book1.Attribute(L"ISBN").Exists();

The list of Attribute setters and accessors are the same as Element. And they use the same type converter.

Comments

For your information, XML comment come in the form of <!--My example comments here--> Below are a bunch of operations you can use with comments.

using namespace Elmax;
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem.AddComment(L"Can you see me?"); // add a new comment!

Comment comment = elem.GetComment(0); // get comment at 0 index

comment.Update(L"Can you hear me?"); // update the comment

comment.Delete(); // Delete this comment node!

You can get a vector of Comment objects which are children of the element, using GetCommentCollection method.

CData section

For your information, XML CData section come in the form of <![CDATA[" <IgnoredInCDataSection/> "]]>. XML CData section typically contains data which is not parsed by the parsers, therefore it can contains < and > and other invalid text characters. Some programmers prefers to store them in Base64 format(See next section). Below are a bunch of operations you can use with CData sections.

using namespace Elmax;
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem.AddCData(L"<<>>"); // add a new CData section!

CData cdata = elem.GetCData(0); // get CData section at 0 index

cdata.Update(L">><<"); // update the CData section

cdata.Delete(); // Delete this CData section node!

You can get a vector of CData sections which are children of the element, using GetCDataCollection method.

Base64

Some programmers prefer to store binary data in the Base64 format under 1 element, instead of CData section, to easily identify and find it. The downside is Base64 format takes up more space and data conversion takes time. The code example shows how to use Base64 conversion before assignment, and also to convert back from Base64 to binary data after reading.
Elmax::Element elem1;
string strNormal = "@#$^*_+-|\~<>";
// Assigning base64 data
elem1 = Element::ConvToBase64(strNormal.c_str(), strNormal.length());

// Reading base64 data
wstring strBase64 = elem1.GetString(L"ABC");

size_t len = 0;
// Get the length required
Element::ConvFromBase64(strBase64, NULL, len);

char* p = new char[len+1];
memset(p, 0, len+1);

Element::ConvFromBase64(strBase64, p, len);
// process p here (not shown)(Remember to delete p).

C++0x move constructor

Elmax library defines some C++0x move constructors and move assignments. In order to build the library in older Visual Studio prior to the 2010 version, you have to hide them by defining _HAS_CPP0X to be 0 in the stdafx.h.

What is in the Elmax name?

The abstraction model and the library is named "Elmax" because there is a 'X', 'M' and 'L' in "Elmax". <whisper>I can tell you the real reason but you must not tell anyone, else I have to eliminate you from this world! The reason is the author likes to crack jokes in real life. But all his jokes are deemed by everyone to be lame and cold. In Chinese language, cold joke mean joke which is not funny or laughable at all! If you rearrange alphabets in "Elmax", you get "LameX" which refers to the author!</whisper>

What's next?

In the next article, the XML parsing is going to get even easier! That is, parsing is eliminated; the programmer does not have to do the XML parsing himself/herself! XML parsing is done automatically, along the lines of Object Relational Mapping (ORM). I personally don't see the need for programmer to do XML parsing. Just pass in a specially formatted structure(s) with an XML file and the library will fill in the structure for you! Just treat that I am kidding! There is no way I'll have time for this as my part-time undergrad course is starting soon!

Thanks for reading!

Bug reports

For bug reports and feature requests, please file them here. When you file a bug report, please do include the sample code and xml file (if any) to reproduce the bug. The current Elmax is at version 0.6 Beta. It's codeplex site is located at http://elmax.codeplex.com/

History

22/12/2010 : 1st release

References

Base64 conversion class used in Elmax is from Jan Raddatz's article on Codeguru: BASE 64 Decoding and Encoding Class



About the Author

Wong Shao Voon

I guess I'll write here what I does in my free time, than to write an accolade of skills which I currently possess. I believe the things I does in my free time, say more about me.

When I am not working, I like to watch Japanese anime. I am also writing some movie script, hoping to see my own movie on the big screen one day.

I like to jog because it makes me feel good, having done something meaningful in the morning before the day starts.

I also writes articles for CodeGuru; I have a few ideas to write about but never get around writing because of hectic schedule.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

  • Live Event Date: September 17, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Another day, another end-of-support deadline. You've heard enough about the hazards of not migrating to Windows Server 2008 or 2012. What you may not know is that there's plenty in it for you and your business, like increased automation and performance, time-saving technical features, and a lower total cost of ownership. Check out this upcoming eSeminar and join Rich Holmes, Pomeroy's practice director of virtualization, as he discusses the …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds