A Kick-Start to SAX with C++, Part 3

This is the third and last article of this series. In the first article, you saw what SAX is, what the Microsoft COM implementation of SAX is, and how to write a simple parser of a XML document. In the second article, you saw how to work around the limitation of having a single handler type registered to the XML reader at a time. In both articles, you ignored anything related to errors. Now, you will see how to implement some simple error handling. Before going any further, I recommend that you read the first and second parts, if you haven't already.

Microsoft COM Support for Errors Handling

As seen in the previous articles, you can register to the XML reader a content handler, a DTD handler (which will not be covered), and an error handler. In the COM implementation of SAX, ISAXErrorHandler is the interface that provides the error handling functionality.

ISAXErrorHandler has three methods. All of them take the same three parameters:

Parameter Description
ISAXLocator * pLocator Pointer to a locator object that contains information about the file, line, and column where the error occurred
const wchar_t * pwchErrorMessage Textual information about the error
HRESULT hrErrorCode Error code identifying the reason of the error

The three methods of ISAXErrorHandler are:

Method Description
error() Receives notification of a recoverable error
fatalError() Receives notification of non-recoverable errors
ignorableWarning() Receives notifications of warnings

However, in the current implementation all warnings are treated as errors and all errors are non-recoverable. Thus, only fatalError() is called in the current implementation.

ISAXLocator associates a SAX event with a document location. The information stored in the locator is updated for each event. The interface has four methods (table taken from MSDN):

Method Description
getColumnNumber Returns the column number where the current document event ends
getLineNumber Returns the line number where the current document event ends
getPublicId Returns the public identifier for the current document event
getSystemId Returns the system identifier for the current document event

A Kick-Start to SAX with C++, Part 3

Stating the Problem

You will continue with the application you wrote during the second part of the tutorial. You have a store where books and CDs are sold. You keep information about the items in a XML document that is parsed with SAX. The XML document you used is:

<?xml version="1.0" encoding="utf-8"?>

<store>
   <book isbn="10000001">
      <title>The Lord Of The Rings</title>
      <author>J.R.R. Tolkien</author>
   </book>

   <book>
      <title>Maitreyi</title>
      <author>Mircea Eliade</author>
   </book>

   <cd>
      <title>The Wall</title>
      <artist>Pink Floyd</artist>
      <track length="3:40">Another Brick in the Wall</track>
      <track length="5:33">Mother</track>
   </cd>

   <cd>
      <title>Come on Over</title>
      <artist>Shania Twain</artist>
      <track length="4:40">From This Moment On</track>
      <track length="3:33">You're Still The One</track>
   </cd>
</store>

But now, you will introduce an error in this file. For instance, change the closing tag of the last track:

<cd>
   <title>Come on Over</title>
   <artist>Shania Twain</artist>
   <track length="4:40">From This Moment On</track>
   <track length="3:33">You're Still The One</track_error>
</cd>

With this, the element is no longer valid; thus, the entire file is corrupted. However, you want to keep and display all the information you successfully read before the error occurred.

Approaching the Problem

In the first two articles, the content handlers contained a stack where objects were pushed as the startElement() event was fired and popped out when the endElement() event occurred. But, if an error occurred, the parsing was aborted and you didn't take any care of cleaning up the stack, thus ending up with memory leaks. Obviously, that's not acceptable.

In this implementation, you continue to work with the Book, CD, Track, Store, SAXContentHandlerImpl, and SAXErrorHandlerImpl classes from the previous articles. However, you will add a pure virtual method to SAXContentHandlerImpl to notify the handler that a fatal error has occurred.

virtual void FatalError() = 0;

Basically, you will use the same SAXBooks, SAXCDs, and SAXTracks content handlers. They will follow the same logic as in the previous article, but their stack implementation will be slightly different and they also will implement the FatalError() method.

To ease the stack operation (especially the cleaning up), you will work with a custom stack:

// types of all elements
enum ElemType {SStore, SBook, SCD, STrack, SWString};

// element of the stack
struct StackElement
{
   StackElement(void* element, ElemType elemtype):
      type(elemtype), data(element)
   {
   }

   ElemType   type;
   void*      data;
};

// stack defines
class SAXStack
{
   std::stack<StackElement> elements;
public:
   void push(const StackElement& value)
   {
      elements.push(value);
   }

   StackElement& top()
   {
      return elements.top();
   }

   void pop()
   {
      elements.pop();
   }

   void empty();
};

SAXStack is based on std::stack, with elements of type StackElement. It's the same structure you used in the first article. SAXStack provides operations for pushing a new element into the stack, popping the element at top, accessing the top element, and emptying the stack. The method that takes all the elements out of the stack also takes care of deleting allocated memory.

void SAXStack::empty()
{
   while(!elements.empty())
   {
      StackElement elem = elements.top();
      elements.pop();

      switch(elem.type)
      {
      case SStore:     delete (Store*)elem.data; break;
      case SBook:      delete (Book*)elem.data; break;
      case SCD:        delete (CD*)elem.data; break;
      case STrack:     delete (Track*)elem.data; break;
      case SWString:   delete (std::wstring*)elem.data; break;
      }
   }
}

The new implementation of SAXBooks looks like this:

class SAXBooks :
   public SAXContentHandlerImpl
{
   std::vector<Book>   books;
   SAXStack            stack;
   bool                hasText;

   long m_RefCount;
public:
   SAXBooks(void);
   virtual ~SAXBooks(void);

   unsigned long __stdcall AddRef(void)
   {
      return InterlockedIncrement(&m_RefCount);
   }
   unsigned long __stdcall Release(void)
   {
      long nRefCount=0;
      nRefCount=InterlockedDecrement(&m_RefCount) ;
      if (nRefCount == 0) delete this;
      return nRefCount;
   }

   std::vector<Book> GetBooks() const {return books;}

   virtual HRESULT STDMETHODCALLTYPE startElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName,
      ISAXAttributes __RPC_FAR *pAttributes);

   virtual HRESULT STDMETHODCALLTYPE endElement(
      wchar_t __RPC_FAR *pwchNamespaceUri,
      int cchNamespaceUri,
      wchar_t __RPC_FAR *pwchLocalName,
      int cchLocalName,
      wchar_t __RPC_FAR *pwchRawName,
      int cchRawName);

   virtual HRESULT STDMETHODCALLTYPE characters(
      wchar_t __RPC_FAR *pwchChars,
      int cchChars);

   virtual void FatalError();
};

SAXBooks::SAXBooks(void)
:hasText(false),
m_RefCount(0)
{
}

SAXBooks::~SAXBooks(void)
{
}

HRESULT STDMETHODCALLTYPE SAXBooks::startElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName,
   ISAXAttributes __RPC_FAR *pAttributes)
{
   hasText = false;

   std::wstring localName(pwchLocalName);

   if(localName.compare(L"book") == 0)
   {
      stack.push(StackElement(
         new Book(GetAttributeValue(pAttributes, L"isbn", L"unknown")),
         SBook));
   }
   else if(localName.compare(L"title") == 0 ||
      localName.compare(L"author") == 0)
   {
      stack.push(StackElement(new std::wstring(), SWString));
      hasText = true;
   }

   return S_OK;
}

HRESULT STDMETHODCALLTYPE SAXBooks::endElement(
   wchar_t __RPC_FAR *pwchNamespaceUri,
   int cchNamespaceUri,
   wchar_t __RPC_FAR *pwchLocalName,
   int cchLocalName,
   wchar_t __RPC_FAR *pwchRawName,
   int cchRawName)
{
   hasText = false;

   // take the data from the stack's top
   StackElement elem = stack.top();
   // pop the element at the top
   stack.pop();

   std::wstring localName(pwchLocalName);

   if(localName.compare(L"book") == 0)
   {
      books.push_back(*(Book*)elem.data);
      delete (Book*)elem.data;
   }
   else if(localName.compare(L"title") == 0)
   {
      ((Book*)stack.top().data)->SetTitle(*(std::wstring*)elem.data);
      delete (std::wstring*)elem.data;
   }
   else if(localName.compare(L"author") == 0)
   {
      ((Book*)stack.top().data)->SetAuthor(*(std::wstring*)elem.data);
      delete (std::wstring*)elem.data;
   }
   else
   {
      // put data back to the stack
      stack.push(elem);
   }

   return S_OK;
}

HRESULT STDMETHODCALLTYPE SAXBooks::characters(
   wchar_t __RPC_FAR *pwchChars,
   int cchChars)
{
   if(hasText)
   {
      // append current text to the existing text
      std::wstring* top = (std::wstring*)stack.top().data;
      std::wstring text(pwchChars, cchChars);
      *top += text;
   }
   else
   {
      // read data that is not part of recognized element
   }

   return S_OK;
}

void SAXBooks::FatalError()
{
   stack.empty();
}

Notice that you handle the events and work with the stack in the same manner. But, when you are notified that an error has occurred, you must clean up the stack, but you don't remove any of the successfully parsed elements (those from books).

For SAXCDs and SAXTracks, look into the sample application files.

The SAXMultiContentHandler is the same from the previous article, but being a content handler, it also implements the FatalError() interface by notifying all the registered handlers or the error occurrence.

void SAXMultiContentHandler::FatalError()
{
   for(std::vector<ElementHandler>::iterator it =
       handlers.begin();
       it != handlers.end(); ++it)
   {
      (*it).handler->FatalError();
   }
}

Error Handler Implementation

You will derive a handler from SAXErrorHandlerImpl. It will implement only the fatalError() interface because it's the only one that is called. The error handler keeps a pointer to a SAXMultiContentHandler object and will notify it when an error occurs. The notification then will be dispatched to all the handlers that have registered to the multi content handler. This way, you are sure that all handlers that are interested in the content of the document are notified about the error and can perform cleanup.

class SAXMultiErrorHandler :
   public SAXErrorHandlerImpl
{
   SAXMultiContentHandler*   handler;
public:
   SAXMultiErrorHandler(SAXMultiContentHandler* hdl);
   virtual ~SAXMultiErrorHandler(void);

   virtual HRESULT STDMETHODCALLTYPE fatalError(
      ISAXLocator __RPC_FAR *pLocator,
      wchar_t * pwchErrorMessage,
      HRESULT errCode);
};

SAXMultiErrorHandler::SAXMultiErrorHandler(SAXMultiContentHandler* hdl):
handler(hdl)
{
}

SAXMultiErrorHandler::~SAXMultiErrorHandler(void)
{
}

HRESULT STDMETHODCALLTYPE SAXMultiErrorHandler::fatalError(
   ISAXLocator __RPC_FAR *pLocator,
   wchar_t * pwchErrorMessage,
   HRESULT errCode)
{
   // get information about the file, line, and column where error
   // occurred
   int errorLine = -1;
   int errorCol  = -1;
   wchar_t *filepath[MAX_PATH+1];
   pLocator->getLineNumber(&errorLine);
   pLocator->getColumnNumber(&errorCol);
   pLocator->getSystemId(filepath);
   std::wstring errorText(pwchErrorMessage);

   // print an error message
   std::wcout << *filepath << L"(line " << errorLine
              << L", column " << errorCol "" L"): error "
              "" std::hex << errCode
              << L": " << errorText << std::endl;

   // notify the multi content handler about the error
   handler->FatalError();

   return E_FAIL;
}

When fatalError() gets called, you use the ISAXLocator interface to get information about the document, line, and column where the error occurred. You also have an error code identifying the error reason and a textual description of the error. All that information is displayed on the console. After notifying the content handler, the function returns E_FAIL, indicating that the parsing should stop.

A Kick-Start to SAX with C++, Part 3

Putting It All Together

To read the XML document, create the internal data structures, and display it, you must follow these steps:

  1. Initialize the COM library for the current thread.
  2. Create an instance of the XML reader.
  3. Create an instance of the multi content handler (SAXMultiContentHandler) and register it to the XML reader.
  4. Create an instance of the books handler (SAXBooks) and register it to handle 'book' to the multi content handler.
  5. Create an instance of the CDs handler (SAXCDs) and register it to handle 'cd' to the multi content handler.
  6. Create an instance of the tracks handler (SAXTracks) and register it to handle 'track' to the multi content handler.
  7. Create an instance of the error handler (SAXMultiErrorHandler), passing to it a pointer to the SAXMultiContentHandler object and register it to the XML reader.
  8. Parse the document.
  9. Check the result and display the items.
  10. Release the handlers.
  11. Un-initialize the COM library for the current thread.
int wmain(int argc, wchar_t* argv[])
{
   if (argc<2)
   {
      wprintf(L"no argument provided\n");
      return 0;
   }

   CoInitialize(NULL);
   ISAXXMLReader* pXMLReader = NULL;

   HRESULT hr = CoCreateInstance(
      __uuidof(SAXXMLReader),
      NULL,
      CLSCTX_ALL,
      __uuidof(ISAXXMLReader),
      (void **)&pXMLReader);

   if(!FAILED(hr))
   {
      SAXMultiContentHandler *pMultiHandler =
         new SAXMultiContentHandler();
      hr = pXMLReader->putContentHandler(pMultiHandler);

      SAXBooks *pBooksHandler = new SAXBooks();
      pMultiHandler->AddHandler(L"book", pBooksHandler);

      SAXCDs *pCDsHandler = new SAXCDs();
      pMultiHandler->AddHandler(L"cd", pCDsHandler);

      SAXTracks *pTracksHandler = new SAXTracks();
      pMultiHandler->AddHandler(L"track", pTracksHandler);

      SAXMultiErrorHandler * pErrorHandler =
         new SAXMultiErrorHandler(pMultiHandler);
      hr = pXMLReader->putErrorHandler(pErrorHandler);

      std::wcout << L"Parsing document: " << argv[1]
                 << std::endl << std::endl;

      hr = pXMLReader->parseURL(argv[1]);
      if(!FAILED(hr))
      {
         std::wcout << L"\nDocument parsed successfully\n";
      }
      else
      {
         std::wcout << L"\nParse result code: " << std::hex
                    << hr << std::endl << std::endl;
      }

      std::vector<Book> books   = pBooksHandler->GetBooks();
      std::vector<CD> cds       = pCDsHandler->GetCDs();
      std::vector<Track> tracks = pTracksHandler->GetTracks();

     for(std::vector<Book>::const_iterator itb = books.begin();
         itb != books.end(); ++itb)
      {
         std::wcout < (*itb).ToString() << L'\n';
      }

      std::wcout << L'\n';

      for(std::vector<CD>::const_iterator itc = cds.begin();
         itc != cds.end(); ++itc)
      {
         std::wcout << (*itc).ToString() << L'\n';
      }

      std::wcout << L'\n';

      for(std::vector<Track>::const_iterator itt = tracks.begin();
         itt != tracks.end(); ++itt)
      {
         std::wcout << (*itt).ToString() << L'\n';
      }

      pXMLReader->Release();

      delete pTracksHandler;
      delete pCDsHandler;
      delete pBooksHandler;
   }
   else
   {
      std::wcout << L"\nParse result code: " << std::hex
                 << hr << std::endl << std::endl;
   }

   CoUninitialize();

   return 0;
}

Don't forget to put these two lines of code in your application (if you are using precompiled headers, put it in "stdafx.h"):

#import <msxml3.dll> raw_interfaces_only
using namespace MSXML2;

Running the program with the corrupted XML file, you get the following output:

Parsing document: store.xml

file:///e:/VisualC++/Xml/SAXErrors/store.xml(line 25, column 46): error
c00ce56d: End tag 'track_error' does not match the start tag 'track'.


Parse result code: c00ce56d

'The Lord Of The Rings' by J.R.R. Tolkien, ISBN: 10000001
'Maitreyi' by Mircea Eliade, ISBN: unknown

'The Wall', Pink Floyd
  1. Another Brick in the Wall, 3:40
  2. Mother, 5:33


Another Brick in the Wall, 3:40
Mother, 5:33
From This Moment On, 4:40
Press any key to continue

Conclusions

In this article, you have taken one step forward and seen how you can handle the errors that may occur during the parsing of the document. You have done it in a simple way, like all the handling that you did in this tutorial. As stated in the title, this is only a kick-start and not a full coverage of SAX. But now, you should be able to start building your own applications with SAX.

About the sample code

The sample application provided with this article is written in VC++ 7.1. The same code can be successfully compiled with the previous version. MSXML 3.0 or higher should be installed on your machine.



About the Author

Marius Bancila

Marius Bancila is a Microsoft MVP for VC++. He works as a software developer for a Norwegian-based company. He is mainly focused on building desktop applications with MFC and VC#. He keeps a blog at www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++/VC++ programmers.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • Due to internal controls and regulations, the amount of long term archival data is increasing every year. Since magnetic tape does not need to be periodically operated or connected to a power source, there will be no data loss because of performance degradation due to the drive actuator. Read this white paper to learn about a series of tests that determined magnetic tape is a reliable long-term storage solution for up to 30 years.

Most Popular Programming Stories

More for Developers

RSS Feeds