A Kick-Start to SAX with C++, Part 2

This is the second article of the tutorial. Because it is highly coupled with the first one, I recommend reading the first part before going any further.

Stating the Problem

In the first article, you saw what SAX is, what the Microsoft COM implementation of SAX is, and how you can write a simple parser of a XML document. During the first article, I highlighted a couple of times that you can only register one handler type (content handler, error handler, or DTD handler) at a time. Although this may suffice for some applications, it may be a drawback for others.

Take a look at the XML document you used in the first article.

<?xml version="1.0" encoding="utf-8"?>

<store>
   <book isbn="10000001">
      <title>The Lord Of The Rings</title>
      <author>J.R.R. Tolkien</author>
   </book>

   <book>
      <title>Maitreyi</title>
      <author>Mircea Eliade</author>
   </book>

   <cd>
      <title>The Wall</title>
      <artist>Pink Floyd</artist>
      <track length="3:40">Another Brick in the Wall</track>
      <track length="5:33">Mother</track>
   </cd>

   <cd>
      <title>Come on Over</title>
      <artist>Shania Twain</artist>
      <track length="4:40">From This Moment On</track>
      <track length="3:33">You're Still The One</track>
   </cd>
</store>

You assumed that you had a store where books and CDs are sold and this XML document is a catalog of all items in the store. In the first article, you wrote an application using SAX that parsed this document creating lists of books and CDs and displayed them on the console. But, you had both the books and CDs processed by the same content handler.

Now you want more. You want to create a more complex application with separate components to handle the books and CDs. Moreover, you want a third component to handle only tracks, which means that tracks would be handled by two components at the same time. This scenario is not possible with the default implementation of SAX, so you'll have to make it work.

Approaching the Problem

The solution you will see is based on the sample code provided in the first article. Class Book, CD, Track, Store, SAXErrorHandlerImpl, and SAXContentHandlerImpl are the same from the first article. Only SAXContentHandlerImpl, which was a base class for your former content handler, and continues to be so, has a new method, GetAttributeValue. As explained in the first article, this method is used to extract the value of an element's attribute.

You will start by writing specialized content handlers for each category of information you are interested in. Thus, you will have three new content handlers: SAXBooks for books, SAXCDs for CDs, and SAXTracks for Tracks.

class SAXBooks :
   public SAXContentHandlerImpl
{
   std::vector<Book>   books;
   std::stack<void*>   elements;
   bool                hasText;

   long m_RefCount;
public:
   SAXBooks(void);
   virtual ~SAXBooks(void);

   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   std::vector<Book> GetBooks() const {return books;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters(
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);
};

class SAXTracks :
   public SAXContentHandlerImpl
{
   std::vector<Track>   tracks;
   std::stack<void*>    elements;
   bool                 hasText;

   long m_RefCount;
public:
   SAXTracks(void);
   virtual ~SAXTracks(void);

   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   std::vector<Track> GetTracks() const {return tracks;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters( 
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);
};

class SAXCDs :
   public SAXContentHandlerImpl
{
   std::vector<CD>     cds;
   std::stack<void*>   elements;
   bool                hasText;

   long m_RefCount;
public:
   SAXCDs(void);
   virtual ~SAXCDs(void);

   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   std::vector<CD> GetCDs() const {return cds;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters(
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);
};

They follow the exact same logic as SAXStore content handler from the first article. As elements are encountered, they are placed into a stack, where they are popped out when an end-element event is fired. If you still haven't read the first article, it's now time to do so.

A Kick-Start to SAX with C++, Part 2

A Multi-Handler Container

You can only register one content handler at a time, so you will make this handler a container for a list of other content handlers that register for handling a specific XML element, and implicitly all its children. The container handler receives the notifications from the XML reader and checks the list of registered handlers to see to whom it should dispatch the notification. This way, you can have different handlers processing different elements, and moreover, you can have multiple handlers processing the same element.

The SAXBooks handler registers for handling 'book' and is notified of all events related to 'book', 'title', and 'author'. SAXCDs handler registers for handling 'cd' and is notified of all events related to 'cd', 'title', 'artist', and 'track'. Finaly, SAXTracks handler registers for handling 'track' and thus is notified (just like the SAXCDs handler) of all events related to 'track'. None of the handlers is interested in the root element, 'store'.

class SAXMultiContentHandler : public SAXContentHandlerImpl
{
   // list of all registered handlers
   std::vector<ElementHandler>          handlers;
   // list of handlers that want to handle current element
   std::vectorSAXContentHandlerImpl*>   crthandlers;
long m_RefCount;
public:
   SAXMultiContentHandler(void);
   virtual ~SAXMultiContentHandler(void);

   void AddHandler(std::wstring element,
                   SAXContentHandlerImpl* handler);
   void RemovedHandler(std::wstring element);

   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters(
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);
};

SAXMultiContentHandler keeps two lists: one of all register handlers (called handlers) and one of handlers that must be notified of events related to the current element and its children (called crthandlers).

The elements of the first list are of type ElementHandler that associates an XML element (such as 'book', 'cd', or 'track') with the handler that registers to handle it.

struct ElementHandler
{
   std::wstring             element;    // element to handler
   SAXContentHandlerImpl*   handler;    // pointer to the handler
                                        // interface

   ElementHandler(std::wstring   elem, SAXContentHandlerImpl* hdl):
      element(elem), handler(hdl)
      {
      }
};

SAXMultiContentHandler has a method for registering new handlers and a method for removing handlers:

void SAXMultiContentHandler::AddHandler(std::wstring element,
   SAXContentHandlerImpl* handler)
{
   if(handler)
   {
      handlers.push_back(ElementHandler(element, handler));
   }
}

void SAXMultiContentHandler::RemovedHandler(std::wstring element)
{
   for(std::vector<ElementHandler>::iterator it = handlers.begin();
      it != handlers.end(); ++it)
   {
      if(((ElementHandler)(*it)).element.compare(element) == 0)
      {
         handlers.erase(it);
         return;
      }
   }
}

For this simple application, you will continue to handle only three events: startElement(), endElement(), and characters().

SAXMultiContentHandler works on the following logic: The list crthandlers is filled with handlers that must be notified of the events (whatever they are). When a starElement() event is fired, it checks whether there are already handlers in this list; if so, they are notified of the event. When a new parent node, such as 'book' or 'cd' in your case, is encountered the list is empty. Thus, the next step is to check the list of all registered handlers and see what handlers have registered for the current element. If there are handlers (not already listed in crthandlers), they are added to the crthandlers list and notified of the event.

HRESULT STDMETHODCALLTYPE SAXMultiContentHandler::startElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName,
   ISAXAttributes __RPC_FAR *pAttributes)
{
   // get the local name of the current element
   std::wstring element(pwchLocalName, cchLocalName);
   // dispatch the notification to all handlers that need to handle
   // this element
   for(std::vector<SAXContentHandlerImpl*>::iterator
       it = crthandlers.begin();
       it != crthandlers.end(); ++it)
   {
      ((SAXContentHandlerImpl*)(*it))->startElement(pwchNamespaceUri,
         cchNamespaceUri,
         pwchLocalName,
         cchLocalName,
         pwchRawName,
         cchRawName,
         pAttributes);
   }
   // check whether this element must be handled
   for(std::vector<ElementHandler>::iterator
       it = handlers.begin();
       it != handlers.end(); ++it)
   {
      // verify if there is a handler registered for this element
      if(((ElementHandler)(*it)).element.compare(element) == 0)
      {
         // check whether the handler is already in the list of
         // current handlers
         if(crthandlers.end() ==
            std::find(crthandlers.begin(), crthandlers.end(),
                      ((ElementHandler)(*it)).handler))
         {
            // add the handler to the list and notify it
            crthandlers.push_back(((ElementHandler)(*it)).handler);
            (((ElementHandler)(*it)).handler)->
               startElement(pwchNamespaceUri,
               cchNamespaceUri,
               pwchLocalName,
               cchLocalName,
               pwchRawName,
               cchRawName,
               pAttributes);
         }
      }
   }

   return S_OK;
}

When characters() is fired, you have a list of handlers interested in the current element, and the notification is dispatched to all of them:

HRESULT STDMETHODCALLTYPE SAXMultiContentHandler::characters(
   wchar_t __RPC_FAR *pwchChars,
   int cchChars)
{
   // dispatch the notification to all handlers that need to handle
   // the current element
   for(std::vector<SAXContentHandlerImpl*>::iterator
       it = crthandlers.begin();
       it != crthandlers.end(); ++it)
   {
      ((SAXContentHandlerImpl*)(*it))->
       characters(pwchChars, cchChars);
   }
   return S_OK;
}

The same is valid for the endElement(): First all the handlers from crthandlers are notified of the event. But, because the element is ended, you must check what handlers from crthandler have registered for this element and remove them from the list.

HRESULT STDMETHODCALLTYPE SAXMultiContentHandler::endElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName)
{
   // get the local name of the current element
   std::wstring element(pwchLocalName, cchLocalName);
   // dispatch the notification to all handlers that need to handle
   // this element
      for(std::vector<SAXContentHandlerImpl*>::iterator
          it = crthandlers.begin(); 
          it != crthandlers.end(); ++it)
   {
      SAXContentHandlerImpl* handler = (SAXContentHandlerImpl*)(*it);
      if(handler)
      {
         handler->endElement(pwchNamespaceUri,
            cchNamespaceUri,
            pwchLocalName,
            cchLocalName,
            pwchRawName,
            cchRawName);
      }
   }
   // remove from list of current handlers all the handlers that
   // handled this element
   for(std::vector<ElementHandler>::iterator
       it = handlers.begin(); 
       it != handlers.end(); ++it)
   {
      if(((ElementHandler)(*it)).element.compare(element) == 0)
      {
         std::vector<SAXContentHandlerImpl*>::iterator pos =
            std::find(crthandlers.begin(), crthandlers.end(),
                      ((ElementHandler)(*it)).handler);
         if(pos != crthandlers.end())
         {
            crthandlers.erase(pos);
         }
      }
   }

   return S_OK;
}

To make it clearer, let me exemplify with this node:

   <cd>
      <title>Come on Over</title>
      <artist>Shania Twain</artist>
      <track length="4:40">From This Moment On</track>
      <track length="3:33">You're Still The One</track>
   </cd>
TagEventContent of crthandlers After Event Is Processed
<cd>startElementSAXCDs
<title>startElementSAXCDs
</title>endElementSAXCDs
<artist>startElementSAXCDs
</artist>endElementSAXCDs
<track>startElementSAXCDs, SAXTracks
</track>endElementSAXCDs
<track>startElementSAXCDs, SAXTracks
</track>endElementSAXCDs
</cd>endElement 

A Kick-Start to SAX with C++, Part 2

Putting It All Together

As in the case of the first application, to read the XML document, create the internal data structures, and display it, you must follow these steps:

  • Initialize COM library for the current thread
  • Create an instance of the XML reader
  • Create an instance of the multi content handler (SAXMultiContentHandler) and register it to the XML reader
  • Create an instance of the books handler (SAXBooks) and register it to handle 'book' to the multi content handler
  • Create an instance of the CDs handler (SAXCDs) and register it to handle 'cd' to the multi content handler
  • Create an instance of the tracks handler (SAXTracks) and register it to handle 'track' to the multi content handler
  • Create an instance of the error handler (SAXErrorHandlerImpl) and register it to the XML reader
  • Parse the document
  • Check the result and display the items
  • Release the handlers
  • Un-initialize the COM library for the current thread
int wmain(int argc, wchar_t* argv[])
{
   if (argc<2)
   {
      wprintf(L"no argument provided\n");
      return 0;
   }

   CoInitialize(NULL);
   ISAXXMLReader* pXMLReader = NULL;

   HRESULT hr = CoCreateInstance(
      __uuidof(SAXXMLReader),
      NULL,
      CLSCTX_ALL,
      __uuidof(ISAXXMLReader),
      (void **)&pXMLReader);

   if(!FAILED(hr))
   {
      SAXMultiContentHandler *pMultiHandler =
      new SAXMultiContentHandler();
      hr = pXMLReader->putContentHandler(pMultiHandler);

      SAXBooks *pBooksHandler = new SAXBooks();
      pMultiHandler->AddHandler(L"book", pBooksHandler);

      SAXCDs *pCDsHandler = new SAXCDs();
      pMultiHandler->AddHandler(L"cd", pCDsHandler);

      SAXTracks *pTracksHandler = new SAXTracks();
      pMultiHandler->AddHandler(L"track", pTracksHandler);

      SAXErrorHandlerImpl * pErrorHandler = new SAXErrorHandlerImpl();
      hr = pXMLReader->putErrorHandler(pErrorHandler);

      std::wcout << L"Parsing document: " << argv[1]
                 << std::endl << std::endl;

      hr = pXMLReader->parseURL(argv[1]);
      if(!FAILED(hr))
      {
         std::vector<Book> books   = pBooksHandler->GetBooks();
         std::vector<CD> cds       = pCDsHandler->GetCDs();
         std::vector<Track> tracks = pTracksHandler->GetTracks();

         for(std::vector<Book>::const_iterator itb = books.begin();
            itb != books.end(); ++itb)
         {
            std::wcout << (*itb).ToString() << L'\n';
         }

         std::wcout << L'\n';

         for(std::vector<CD>::const_iterator itc = cds.begin();
            itc != cds.end(); ++itc)
         {
            std::wcout << (*itc).ToString() << L'\n';
         }

         std::wcout << L'\n';

   for(std::vector<Track>::const_iterator itt = tracks.begin();
            itt != tracks.end(); ++itt)
         {
            std::wcout << (*itt).ToString() << L'\n';
         }

      }
      else
      {
         std::wcout << L"\nParse result code: " <<
            std::setbase(std::ios_base::hex) 
            << hr << std::endl << std::endl;
      }

      pXMLReader->Release();

      delete pTracksHandler;
      delete pCDsHandler;
      delete pBooksHandler;
   }
   else
   {
      std::wcout << L"\nParse result code: " <<
         std::setbase(std::ios_base::hex) <<
         hr << std::endl << std::endl;
   }

   CoUninitialize();

   return 0;
}

Don't forget to put these two lines of code in your application (if you are using precompiled headers, put it in "stdafx.h"):

#import <msxml3.dll> raw_interfaces_only
using namespace MSXML2;

Conclusions

In this article, you have seen how you can work around the limitation of using only a single error handler at a time, by creating a multi handler that acts as a dispatcher for other handlers. The SAXMultiContentHandler is not provided as a built-in solution to the problem, though it can be freely used. Its sole purpose is to guide you through the process of building a dispatcher that suits your needs.

In the first two articles, you ignored any errors that might occur. In the third and final part of this tutorial, you will see how to implement some simple error handling.

About the sample code

The sample application provided with this article is written in VC++ 7.1. The same code can be successfully compiled with the previous version. MSXML 3.0 or higher should be installed on your machine.



About the Author

Marius Bancila

Marius Bancila is a Microsoft MVP for VC++. He works as a software developer for a Norwegian-based company. He is mainly focused on building desktop applications with MFC and VC#. He keeps a blog at www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++/VC++ programmers.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: April 22, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Database professionals — whether developers or DBAs — can often save valuable time by learning to get the most from their new or existing productivity tools. Whether you're responsible for managing database projects, performing database health checks and reporting, analyzing code, or measuring software engineering metrics, it's likely you're not taking advantage of some of the lesser-known features of Toad from Dell. Attend this live …

  • Ever-increasing workloads and the challenge of containing costs leave companies conflicted by the need for increased processing capacity while limiting physical expansion. Migration to HP's new generation of increased-density rack-and-blade servers can address growing demands for compute capacity while reducing costly sprawl. Sponsored by: HP and Intel® Xeon® processors Intel, the Intel logo, and Xeon Inside are trademarks of Intel Corporation in the U.S. and/or other countries. HP is the sponsor …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds