A Kick-Start to SAX with C++, Part 1

This series of articles is not a complete tutorial on SAX, but rather a kick-start. Familiarity with XML is required. In this first article, you will see what SAX is, how is it supported with the Microsoft XML Parser (MSXML), and how can you create a simple SAX application.

What Is SAX?

SAX stands for Simple API for XML and is a standard for the event-based (or event-driven) parsing of XML documents. What happens is that, when certain entities are encountered, an event is generated and an event handler or callback function is executed to handle the event. Examples of events are start-of-document, end-of-document, start-of-element, end-of-element, and so forth. The processing of the XML document is done in a serial manner and the document is not saved in memory, like with DOM (Document Object Model). However, in the case of DOM, a tree is created in memory as the file is read and, after that, the tree is traversed to create the internally used data structures. With SAX, you create the internal data structures on the fly, without keeping the same information twice in memory. Although this may not be a problem with small XML documents, when it comes to large files—1 MB or 10 MB—it can prove a very important aspect.

The XML document is parsed by an XML reader. As this reader detects different entities, it fires events. The events are handled by a consumer component, called Content Handler. When parsing errors occur, another set of events is fired, and they are handled by a component called Error Handler. At last, notifications of DTD-specific events are handled by a DTD Handler.

Microsoft COM Implementation of SAX

ISAXXMLReader is the COM/C++ implementation of the XML reader. It has several methods, the most important being the ones that allow you to register an event handler.

Method Description
putContentHandler Used for registering a content handler.
putErrorHandler Used for registering an error handler.
putDTDHandler Used for registering a DTD handler.

None of the three components is required, but if they are not registered, the events will be ignored. Only one handler of each type can be registered at a time; this can be a problem in situation when you want specialized handlers for different parts of the XML document. This will be the focus of a later article. In the meantime, you should know that you can change the registered handlers during the parsing, in which case the events will be handled automatically by the new handler.

The DTD handler is beyond the scope of this article, and will be ignored.

The ISAXContentHandler interface receives notification about the content of the XML document. The implemented interface is shown below (table taken from MSDN):

Method Description
characters Receives notification of character data.
endDocument Receives notification of the end of a document.
startDocument Receives notification of the beginning of a document.
endElement Receives notification of the end of an element.
startElement Receives notification of the beginning of an element.
ignorableWhitespace Receives notification of ignorable white space in element content. This method is not called in the current (MSXML 4.0) implementation because the SAX2 implementation is non-validating.
endPrefixMapping Indicates the end of a namespace prefix that maps to a URI.
startPrefixMapping Indicates the beginning of a namespace prefix that maps to a URI.
processingInstruction Receives notification of a processing instruction.
skippedEntity Receives notification of a skipped entity.

The events that will be covered in this article are starElement(), endElement(), and characters().

starElement() is called each time a new element is parsed. It supports up to three names for each element: namespace URI, local name, and QName (qualified XML name). In addition, if there are attributes attached to the element, they can be accessed via an interface called ISAXAttributes.

HRESULT startElement(
   [in] const wchar_t * pwchNamespaceUri, // The namespace URI
   [in] int cchNamespaceUri,              // The length of the
                                          // namespace URI
   [in] const wchar_t * pwchLocalName,    // The local name string
   [in] int cchLocalName,                 // The length of the local
                                          // name
   [in] const wchar_t * pwchQName,        // The QName, with prefix,
                                          // or, an empty string
   [in] int cchQName,                     // The length of the QName
   [in] ISAXAttributes * pAttributes);    // The attributes attached
                                          // to the element

For each startElement() event, the paired endElement() is called (regardless of whether the element contains data or is empty). The function has the same parameters, except for the attributes interface pointer.

HRESULT endElement(
   [in] const wchar_t * pwchNamespaceUri, // The namespace URI
   [in] int cchNamespaceUri,              // The length of the
                                          // namespace URI
   [in] const wchar_t * pwchLocalName,    // The local name string
   [in] int cchLocalName,                 // The length of the local
                                          // name
   [in] const wchar_t * pwchQName,        // The QName, with prefix,
                                          // or, an empty string
   [in] int cchQName);                    // The length of the QName

Each time the parses encounters raw data, it calls a method, call characters().

HRESULT characters(
   [in] const wchar_t * pwchChars,    // The character data
   [in] int cchChars);                // The length of the character
                                      // string

The first argument is a pointer to the chunk of data, and the second one is the length of actual data that should be processed. The application should not access the data beyond that limit. This method is called for all printable characters, including the white spaces. ignorableWaitspace() should be called for characters that could be ignored, such as white spaces, but without a DTD or a XML schema, the parser cannot know what characters can be ignored. Without validation means this method is never called.

The ISAXAttributes interface provides a means to access the attributes list of an element. From the list of methods it has, you will use these three of them:

Method Description
getLength Returns the count of attributes.
getName Returns all information related to the name of an attribute at a given index.
getValue Returns the text value of an attribute.

A Kick-Start to SAX with C++, Part 1

Problem Specification

To see how SAX works, you will write a simple application. Assume you own a store where you sell (for the beginning) books and CDs. You want to keep information with the catalog of items in an XML document. You want to build an application to read this file and display the existing items on the console.

Sample XML Document

The XML document you will refer to for the rest of the article is shown here.

<?xml version="1.0" encoding="utf-8"?>

<store>
   <book isbn="10000001">
      <title>The Lord Of The Rings</title>
      <author>J.R.R. Tolkin</author>
   </book>

   <book>
      <title>Maitreyi</title>
      <author>Mircea Eliade</author>
   </book>

   <cd>
      <title>The Wall</title>
      <artist>Pink Floyd</artist>
      <track length="3:40">Another Brick in the Wall</track>
      <track length="5:33">Mother</track>
   </cd>

   <cd>
      <title>Come on Over</title>
      <artist>Shania Twain</artist>
      <track length="4:40">From This Moment On</track>
      <track length="3:33">You're Still The One</track>
   </cd>
</store>

Although there can be lots of information stored for these items, for convenience you only use title, author, and ISBN (as an attribute) for books; and for CDs title, artist, and a list of tracks that includes title and length.

Approaching the Problem

In an early paragraph, I mentioned that with SAX you create the data structures that are used internally as the document is parsed. Listed below is how these data structures look for your application:

class Book
{
   std::wstring   title_;
   std::wstring   author_;
   std::wstring   isbn_;
public:
   Book() {}
   Book(std::wstring isbn) {isbn_ = isbn;}

   void SetTitle(std::wstring title) {title_ = title;}
   void SetAuthor(std::wstring author) {author_ = author;}
   void SetIsbn(std::wstring isbn) {isbn_ = isbn;}

   std::wstring ToString() const
   {
      std::wstringstream sstr;
      sstr << L"\'" << title_ << L"\' by " << author_
           << L", ISBN: " << isbn_;
      return sstr.str();
   }
};

class Track
{
   std::wstring   title_;
   std::wstring   length_;
public:
   Track() {}
   Track(std::wstring title, std::wstring length):title_(title),
        length_(length) {}

   void SetTitle(std::wstring title) {title_ = title;}
   void SetLength(std::wstring length) {length_ = length;}

   std::wstring ToString() const
   {
      std::wstringstream sstr;
      sstr << title_ << L", " << length_;
      return sstr.str();
   }
};

class CD
{
   std::wstring         title_;
   std::wstring         artist_;
   std::vector<Track>   tracks_;
public:
   CD() {}

   void SetTitle(std::wstring title) {title_ = title;}
   void SetArtist(std::wstring artist) {artist_ = artist;}
   void AddTrack(Track track) {tracks_.push_back(track);}

   std::wstring ToString() const
   {
      std::wstringstream sstr;
      sstr << L"\'" << title_ << L"', " << artist_ << L'\n';
      int i = 1;
      for(std::vector<Track>::const_iterator it = tracks_.begin();
         it != tracks_.end(); ++ it)
      {
         sstr << L"  " << i++ << L". " << (*it).ToString() << L'\n';
      }

      return sstr.str();
   }
};

class Store
{
   std::vector<Book>   books;
   std::vector<CD>     cds;
public:
   Store() {}

   void AddBook(Book book) {books.push_back(book);}
   void AddCD(CD cd) {cds.push_back(cd);}

   std::wstring ToString() const
   {
      std::wstringstream sstr;
      for(std::vector<Book>::const_iterator itb = books.begin();
         itb != books.end(); ++itb)
      {
         sstr << (*itb).ToString() << L'\n';
      }

      sstr << L'\n';

      for(std::vector<CD>::const_iterator itc = cds.begin();
         itc != cds.end(); ++itc)
      {
         sstr << (*itc).ToString() << L'\n';
      }

      return sstr.str();
   }
};

There is nothing special about these classes. In the Store, you have a list of Books and a list of CDs, and each CD in turn has a list of tracks.

Default Handler Implementations

In this application, you will not handle errors, but you will have a default handler implementation that responds to the events by doing nothing.

class SAXErrorHandlerImpl : public ISAXErrorHandler
{
   long m_RefCount;
public:
   SAXErrorHandlerImpl():m_RefCount(0) {}
   virtual ~SAXErrorHandlerImpl() {}

   long __stdcall QueryInterface(const struct _GUID &riid,
                                 void ** ppObj)
   {
      if (riid == IID_IUnknown)
      {
         *ppObj = static_cast<IUnknown*>(this);
      }
      if (riid == __uuidof(ISAXErrorHandler))
      {
         *ppObj = static_cast<ISAXErrorHandler*>(this);
      }
      else
      {
         *ppObj = NULL ;
         return E_NOINTERFACE ;
      }

      AddRef() ;
      return S_OK;
   }
   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   virtual HRESULT STDMETHODCALLTYPE error(
         ISAXLocator __RPC_FAR *pLocator,
         unsigned short * pwchErrorMessage,
         HRESULT errCode) 
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE fatalError(
         ISAXLocator __RPC_FAR *pLocator,
         unsigned short * pwchErrorMessage,
         HRESULT errCode)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE ignorableWarning(
         ISAXLocator __RPC_FAR *pLocator,
         unsigned short * pwchErrorMessage,
         HRESULT errCode)
      {return S_OK;}
};

You will also have a default content handler that handles all the events but provides not behavior. It must be derived for implementing the actual handling.

class SAXContentHandlerImpl : public ISAXContentHandler
{
   long m_RefCount;
public:
   SAXContentHandlerImpl():m_RefCount(0) {}
   virtual ~SAXContentHandlerImpl() {}

   long __stdcall QueryInterface(const struct _GUID &riid,
                                 void ** ppObj)
      {
         if (riid == IID_IUnknown)
         {
            *ppObj = static_cast<IUnknown*>(this);
         }
         if (riid == __uuidof(ISAXContentHandler))
         {
            *ppObj = static_cast<ISAXContentHandler*>(this);
         }
         else
         {
            *ppObj = NULL ;
            return E_NOINTERFACE ;
         }

         AddRef() ;
         return S_OK;
      }
   unsigned long __stdcall AddRef(void)
      {
         return InterlockedIncrement(&m_RefCount);
      }
   unsigned long __stdcall Release(void)
      {
         long nRefCount=0;
         nRefCount=InterlockedDecrement(&m_RefCount) ;
         if (nRefCount == 0) delete this;
         return nRefCount;
      }


   virtual HRESULT STDMETHODCALLTYPE putDocumentLocator(
         ISAXLocator __RPC_FAR *pLocator)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE startDocument( void)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE endDocument( void)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE startPrefixMapping(
         wchar_t __RPC_FAR *pwchPrefix,
         int cchPrefix,
         wchar_t __RPC_FAR *pwchUri,
         int cchUri)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE endPrefixMapping(
         wchar_t __RPC_FAR *pwchPrefix,
         int cchPrefix)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
         wchar_t __RPC_FAR *pwchNamespaceUri,
         int cchNamespaceUri,
         wchar_t __RPC_FAR *pwchLocalName,
         int cchLocalName,
         wchar_t __RPC_FAR *pwchRawName,
         int cchRawName,
         ISAXAttributes __RPC_FAR *pAttributes)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE endElement(
         wchar_t __RPC_FAR *pwchNamespaceUri,
         int cchNamespaceUri,
         wchar_t __RPC_FAR *pwchLocalName,
         int cchLocalName,
         wchar_t __RPC_FAR *pwchRawName,
         int cchRawName)
      {return S_OK;}

    virtual HRESULT STDMETHODCALLTYPE characters(
         wchar_t __RPC_FAR *pwchChars,
         int cchChars)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE ignorableWhitespace(
         wchar_t __RPC_FAR *pwchChars,
         int cchChars)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE processingInstruction(
         wchar_t __RPC_FAR *pwchTarget,
         int cchTarget,
         wchar_t __RPC_FAR *pwchData,
         int cchData)
      {return S_OK;}

   virtual HRESULT STDMETHODCALLTYPE skippedEntity(
         wchar_t __RPC_FAR *pwchName,
         int cchName)
      {return S_OK;}
};

A Kick-Start to SAX with C++, Part 1

Store Content Handler

It's now the time to have a look on how events can be handled. We will derive from SAXContentHandlerImpl a new handler called SAXStore. The handler will take care of three events: startElement(), endElement() and characters().

class SAXStore :
   public SAXContentHandlerImpl
{
   Store store_;

   std::stack<StackElement>   elements;
   bool                       hasText;

public:
   SAXStore(void);
   virtual ~SAXStore(void);

   Store GetStore() const {return store_;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters(
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);

   std::wstring GetAttributeValue(ISAXAttributes __RPC_FAR
      *pAttributes,
      std::wstring name, std::wstring defvalue);
};

The boolean variable hasText indicates whether the current element should have text or not. The elements with text are title, author, artist, and track.

As new elements are found, they will be pushed into a stack, from where they are popped when an end-element event is fired. The elements of the stack are of type ElementStack:

enum ElemType {SIgnore, SStore, SBook, SCD};

struct StackElement
{
   StackElement(void* element, ElemType elemtype = SIgnore):
      type(elemtype), data(element)
   {
   }

   ElemType   type;
   void*      data;
};

A stack element is an association between data (such as Store, Book, CD, Track, or wstring) and the type of the data. The type is required for the following reason: Both books and CDs have titles, which means that when a title is encountered, on top of the stack can be either a Book or a CD; but without other information, you cannot cast it from void* to the actual object type. dynamic_cast<> does not work with void*. A debate on the best approach is beyond the scope of this article.

To extract an attribute's value, you will use a function called GetAttributeValue(). It takes a pointer to an ISAXAttributes interface, the name of the attribute, a default value (returned in case the attribute is not found), and return the value of the attribute (if found).

std::wstring SAXStore::GetAttributeValue(ISAXAttributes __RPC_FAR
   *pAttributes,
   std::wstring name, std::wstring defvalue)
{
   // get the number of attributes
   int length = 0;
   pAttributes->getLength(&length);

   // enumerate over all attributes
   for ( int i=0; i<length; i++ )
   {
      wchar_t *attrname = NULL, * attrvalue = NULL;
      int namelen = 0, valuelen = 0;

      // get the local name of the current attribute
      pAttributes->getLocalName(i,&attrname,&namelen);
      // get the value of the current attribute
      pAttributes->getValue(i,&attrvalue,&valuelen);
      // if current attribute is the one needed return its value
      if(name.compare(std::wstring(attrname,namelen)) == 0)
         return std::wstring(attrvalue, valuelen);
   }

   // attribute not found; return the default value
   return defvalue;
}

Now, now look at the startElement() method.

HRESULT STDMETHODCALLTYPE SAXStore::startElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName,
   ISAXAttributes __RPC_FAR *pAttributes)
{
   // assume element does not have text
   hasText = false;
   // take the local name
   std::wstring localName(pwchLocalName);

   // test the element's name and take appropriate action
   if(localName.compare(L"store") == 0)
   {
      elements.push(StackElement(new Store, SStore));
   }
   else if(localName.compare(L"book") == 0)
   {
      elements.push(StackElement(
         new Book(GetAttributeValue(pAttributes, L"isbn",
                                    L"unknown")),
         SBook) );
   }
   else if(localName.compare(L"cd") == 0)
   {
      elements.push(StackElement(new CD(), SCD));
   }
   else if(localName.compare(L"track") == 0)
   {
      elements.push(StackElement(
         new std::wstring(GetAttributeValue(pAttributes, L"length",
                          L"0.00"))));
      elements.push(StackElement(new std::wstring()));
      // tracks have text
      hasText = true;
   }
   else if(localName.compare(L"title")  == 0 ||
           localName.compare(L"author") == 0 ||
           localName.compare(L"artist") == 0)
   {
      // all elements have text
      elements.push(StackElement(new std::wstring()));
      hasText = true;
   }

   return S_OK;
}

In startElement(), you check the name of the element. If it's 'store', you push a Store object of the stack. If it's 'book', you push a Book after extracting its ISBN attribute, and if it's 'cd', you push a CD on the stack. These three are parent nodes. They don't have text. If 'track' is encountered, a Track object is pushed on the stack, followed by a new string object that will take the text of the element. The same is done with 'title', 'author', and 'artist'.

characters() is called for all printable characters. You are only interested in text when the hasText variable is set. In this case, on the top of the stack is a std::wstring object to which you will append the current text.

HRESULT STDMETHODCALLTYPE SAXStore::characters(
   wchar_t __RPC_FAR *pwchChars,
   int cchChars)
{
   if(hasText)
   {
      // append current text to the existing text
      std::wstring* top = (std::wstring*)elements.top().data;
      std::wstring text(pwchChars, cchChars);
      *top += text;
   }
   else
   {
      // data which is not part of recognized element
   }

   return S_OK;
}

The last method that you have to implement is endElement().

HRESULT STDMETHODCALLTYPE SAXStore::endElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName)
{
   // reset flag, at this point data is consumed
   hasText = false;
   // take the data from the stack's top
   StackElement elem = elements.top();
   // pop the element at the top
   elements.pop();
   // take the local name
   std::wstring localName(pwchLocalName);
   // check the name of the element and take appropriate action
   if(localName.compare(L"store") == 0)
   {
      store_ = *(Store*)elem.data;
      delete (Store*)elem.data;
   }
   else if(localName.compare(L"book") == 0)
   {
      ((Store*)elements.top().data)->AddBook(*(Book*)elem.data);
      delete (Book*)elem.data;
   }
   else if(localName.compare(L"cd") == 0)
   {
      ((Store*)elements.top().data)->AddCD(*(CD*)elem.data);
      delete (CD*)elem.data;
   }
   else if(localName.compare(L"track") == 0)
   {
      // take the length from the stack
      std::wstring* len = (std::wstring*)elements.top().data;
      // pop the element at the top
      elements.pop();
      // add the track to the CD
      ((CD*)elements.top().data)->AddTrack(Track(*(std::wstring*)
       elem.data, *len));

      delete (std::wstring*)elem.data;
      delete (std::wstring*)len;
   }
   else if(localName.compare(L"title") == 0)
   {
      // check whether the current element is a book or a CD
      switch(elements.top().type)
      {
      case SBook:
         ((Book*)elements.top().data)->SetTitle(*(std::wstring*)
          elem.data);
         break;
      case SCD:
         ((CD*)elements.top().data)->SetTitle(*(std::wstring*)
          elem.data);
         break;
      }

      delete (std::wstring*)elem.data;
   }
   else if(localName.compare(L"author") == 0)
   {
      ((Book*)elements.top().data)->SetAuthor(*(std::wstring*)
       elem.data);
      delete (std::wstring*)elem.data;
   }
   else if(localName.compare(L"artist") == 0)
   {
      ((CD*)elements.top().data)->SetArtist(*(std::wstring*)
       elem.data);
      delete (std::wstring*)elem.data;
   }
   else
   {
      // put data back to the stack
      elements.push(elem);
   }

   return S_OK;
}

You start by resetting the hasText flag, because at this point the element's text is consumed. If the last encountered element had text, it is on top of the stack. You pop the element at the top and check the local name of the current element.

If the local name is 'store', the element at the top was your Store object. If the current element is 'book', the object that was taken from the top was a Book, and now the top is the Store, so you add this Book to the Store. CDs are treated in a similar manner.

In case of tracks, the element just popped out was the title, and on top remained the length of the track, which is also taken out. With this information taken from the stack, you add a new Track to the current CD (that remained on top).

In the case of a title element, you have to check whether the object that remained on top of the stack is a Book or a CD and cast to the appropriate type before setting the title. 'author' and 'artist' are treated similarly, except that they don't require addition information about the object type on top.

And with this, you have completed the content handler. The only thing left is to write the code to read the document and display the results.

A Kick-Start to SAX with C++, Part 1

Putting It All Together

To read the document, create the internal data, and display it, you must follow several steps:

  • Initialize the COM library for the current thread
  • Create an instance of the XML reader
  • Create an instance of the content handler (SAXStore) and register it to the XML reader
  • Create an instance of the error handler (SAXErrorHandlerImpl) and register it to the XML reader
  • Parse the document
  • Check the result and display the items
  • Release the handlers
  • Un-initialize the COM library for the current thread
int wmain(int argc, wchar_t* argv[])
{
   if (argc<2)
   {
      wprintf(L"no argument provided\n");
      return 0;
   }

   // initialize COM library for the current thread
   CoInitialize(NULL);
   ISAXXMLReader* pXMLReader = NULL;

   // create an instance of the XML reader
   HRESULT hr = CoCreateInstance(
      __uuidof(SAXXMLReader),
      NULL,
      CLSCTX_ALL,
      __uuidof(ISAXXMLReader),
      (void **)&pXMLReader);

   if(!FAILED(hr))
   {
      // create a new content handler
      SAXStore *pStoreHandler = new SAXStore();
      // register the content handler with the reader
      hr = pXMLReader->putContentHandler(pStoreHandler);
      // create a new error handler
      SAXErrorHandlerImpl * pErrorHandler = new SAXErrorHandlerImpl();
      // register the error handler with the reader
      hr = pXMLReader->putErrorHandler(pErrorHandler);

      std::wcout << L"Parsing document: " << argv[1] << std::endl
                 << std::endl;
      // parse the document
      hr = pXMLReader->parseURL(argv[1]);
      if(!FAILED(hr))
      {
         // display the items in the store
         std::wcout << pStoreHandler->GetStore().ToString()
                    << std::endl;
      }
      else
      {
         std::wcout << L"\nParse result code: " <<
            std::setbase(std::ios_base::hex) <<
            hr << std::endl << std::endl;
      }

      // destroy the XML reader
      pXMLReader->Release();
   }
   else 
   {
      std::wcout << L"\nParse result code: " <<
         std::setbase(std::ios_base::hex) <<
         hr << std::endl << std::endl;
   }

   // un-itialize the COM library for this threads
   CoUninitialize();

   return 0;
}

One more thing: To successfully compile the code, you must incorporate information from the type library. Put these two lines of code in your application (if you are using precompiled headers, put it in "stdafx.h"):

#import <msxml3.dll> raw_interfaces_only
using namespace MSXML2;

Conclusions

In this article, you have seen how you can create a simple application to read an XML document using SAX to handle the document's content and directly create the internal data structures. As mentioned earlier, only one hander for each events category (content, errors, DTD) can be registered at a time. In the next article of this tutorial, you will see what you can do to bypass this problem.

About the Sample Code

The sample application provided with this article is written in VC++ 7.1. The same code can be successfully compiled with the previous version. MSXML 3.0 or higher should be installed on your machine.



About the Author

Marius Bancila

Marius Bancila is a Microsoft MVP for VC++. He works as a software developer for a Norwegian-based company. He is mainly focused on building desktop applications with MFC and VC#. He keeps a blog at www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++/VC++ programmers.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • This ESG study by Mark Peters evaluated a common industry-standard disk VTl deduplication system (with 15:1 reduction ratio) versus a tape library with LTO-5, drives with full nightly backups, over a five-year period.  The scenarios included replicated systems and offsite tape vaults.  In all circumstances, the TCO for VTL with deduplication ranged from about 2 to 4 times more expensive than the LTO-5 tape library TCO. The paper shares recent ESG research and lots more. 

  • Live Event Date: September 16, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you starting an on-premise-to-cloud data migration project? Have you thought about how much space you might need for your online platform or how to handle data that might be related to users who no longer exist? If these questions or any other concerns have been plaguing you about your migration project, check out this eSeminar. Join our speakers Betsy Bilhorn, VP, Product Management at Scribe, Mike Virnig, PowerSucess Manager and Michele …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds