Using the Microsoft XML Parser to Create XML Documents

Environment: VC 6, Windows 2000, MS XML Parser 3.0

XML provides a powerful and flexible means of expressing and communicating information between different, or unlike, systems. XML is a text-based format thats similar to HTML in many ways. For example, XML documents contain elements that consist of start (such as <font>) and end (such as </font>) tags. Unlike HTML, however, you can use any XML tags you like to – how many tags you use and what names you give them are left to you, the documents designer.

Basics of MXXMLWriter component

Part of the challenge of creating an XML document is making sure that the XML conforms to some basic formatting rules – like ensuring that elements dont overlap. Although it is easy to ensure that a new XML document conforms to the basic formatting rules the XML specification outlines, it is easier – and safer – to use a third party component to manage the output. The component that Ill describe to you in this article is MXXMLWriter, part of Microosfts version three XML Parser.

When you think of a parser, you generally think of something that consumes a stream of data on its input and generates a series of tokens at its output. The Microsoft XML Parser generally works like a regular parser – it takes an XML document on its input and generates either an XML Document Object Model (DOM) – an in-memory, object-based representation of an XML Document, or series of Simple API for XML (SAX) events. In either case, the input is an XML Document.

MXXMLWriter is a part of Microsofts XML parser and is useful for programmatically creating XML documents. The advantages of using the MXXMLWriter over creating an XML document yourself (by hard coding XML tags in your code) include:

  • The MXXMLWriter produces an XML document that conforms to the W3C XML 1.0 Namespace recommendation – no need to be concerned with formatting the output since the parser does the work for you
  • You can hook up the output of the MXXMLWriter to a SAX Content Handler, a stream-based XML consumer, for quick and low memory overhead processing
  • The interface-based programming model makes your code easier to read and maintain.You can send the output to a String or a COM object that supports the IStream interface (such as the ASP Response object)

Basics of Processing XML Documents

Before you can use the MXXMLWriter component to create new XML documents, you need to know some information about processing existing XML documents since the MXXMLWriter component expects applications that use it to act as a SAX event provider.

There are two ways to process an existing XML document: use the Document Object Model (DOM), or use the Simple API for XML (SAX). Both approaches have their own advantages and roles in various applications – the DOM is great of querying XML documents as if they are tables in a database, whereas SAX is great at quickly processing vary large XML documents and basically putting you in complete control of the parsing process.

Using the DOM to Process XML

When you load an XML document into a DOM, the XML parser reads through the entire document and creates a hierarchy of objects that represent the document in memory. Figure 1 shows a fragment of a simple XML document and how the DOM represents it in memory.

Figure 1 – How the DOM represents an XML document

The figure illustrates that an instance of a Document represents an XML document (number one in Figure 1) that contains nodes (number two in Figure 1), which are instances of DOM Node objects (number three in Figure 1).

The DOM makes the XML document ready for you to use once it has read all of it into memory. You can work with the elements in a DOM by iterating over its contents using loops, querying for specific nodes using XPath expressions, or by randomly accessing parts of the DOM tree. The key point is that your application is in control over the document – your application essentially pulls information from an in-memory representation of the document.

Using SAX to Process XML

The SAX programming model is different from the programming model the DOM offers. SAX puts your application much closer to the XML parser by allowing your application to intercept events that the XML parser raises as it processes an XML document. This is a push-programming model since events are pushed to your application, as shown in Figure 2 – your application no longer acts as a point of control, as it does when you use the DOM.

Figure 2 – Using SAX to Process an XML Document

Figure 2 illustrates that an XML Reader (number one in Figure 2) is responsible for reading the XML document. The XML Reader invokes a component that you provide and calls methods on its interfaces as it reads the XML document (number two in Figure 2); your application (number three in Figure 2) consumes the output your component provides.

A component that exposes the ISAXXMLReader represents the XML Reader in Figure 2. To process an XML document using SAX, you register a component that exposes, among other interfaces, the ISAXContentHandler interface. As the XML Reader encounters elements in a document, it calls ISAXContentHandler methods like startDocument, startElement, and characters that provide information about the element that the XML Reader picks up. Figure 3 illustrates a sequence of events (method calls on a component that exposes the ISAXContentHandler interface) that the XML Reader raises as it processes the sample XML file (shown in Figures 1 and 2).

Figure 3 – Sequence of events while processing an XML document

Using MXXMLWriter to create an XML Document

The MXXMLWriter component acts as a SAX consumer, meaning that applications that use the MXXMLWriter act as a SAX provider. The MXXMLWriter consumes the SAX events and writes out a correctly formatted XML document.

The sample code that accompanies this article is a console application that reads a comma separated file and produces an XML document. The XML document that the sample creates is shown on the screen, and you can capture its output to a text file to store it on disk. The sample uses the STL (Standard Template Library) to manage reading the input file and pick up its contents. Listing 1 shows the code that instantiates the MXXMLWriter and establishes pointers to interfaces like ISAXContentHandler.

Listing 1 – Instanciating MXXMLWriter

// MSXML2::IMXWriterPtr is a smart pointer that's declared as a result
// of #importing msxml3.dll in StdAfx.h
MSXML2::IMXWriterPtr pXMLWriter;
pXMLWriter.CreateInstance(__uuidof(MSXML2::MXXMLWriter));

// ISAXContentHandler is an interface that's exposed by MXXMLWriter...
MSXML2::ISAXContentHandlerPtr pContentHandler;
pContentHandler=pXMLWriter;  //calls QI for ISAXContentHandler on pXMLWriter

// ...so is ISAXErrorHandler...
MSXML2::ISAXErrorHandlerPtr pErrorHandler;
pErrorHandler=pXMLWriter; //calls QI for ISAXErrorHandler on pXMLWriter

// ... and so is ISAXDTDHandler...
MSXML2::ISAXDTDHandlerPtr pDTDHandler;
pDTDHandler=pXMLWriter; //calls QI for ISAXDTDHandler on pXMLWriter

//sets output to go to a string
pXMLWriter->put_output(CComVariant(L""));

The code in the listing uses smart pointers that the compiler creates as a result of using the #import statement in stdafx.h. The listing also shows that, other than MXXMLWriter, all pointers refer to interfaces that MXXMLWriter exposes – refer to the comments in the code for details.

Once the MXXMLWriter is ready, the sample reads through the text file and raises SAX events for the MXXMLWriter to process, as shown in Listing 2.

Listing 2 – Raising events for the MXXMLWriter

std::getline(fileIn,lineFromFile);
while(fileIn.good())
{
    npos=lineFromFile.find_first_of(",",nlast);
    // make sure that there's a comma on the current line;
    //find_first_of returns std::wstring::npos if not found
    if(npos!=std::wstring::npos){
      wElementName=A2W(lineFromFile.substr(nlast,npos-nlast).c_str());

      // startElement...
      pContentHandler->startElement(L"",0,L"",0,
        const_cast<wchar_t*>(wElementName.c_str()),
        wElementName.length(),NULL);
      nlast= ++npos;

      // get the rest of the current line (element value)
      wElementValue=A2W(lineFromFile.substr(nlast).c_str());
      // characters...
      pContentHandler->characters(
        const_cast<wchar_t*>(wElementValue.c_str()),
        wElementValue.length());

        // endElement...
        pContentHandler->endElement(L"",0,L"",0,
          const_cast(wElementName.c_str()),
        wElementName.length());
      }
      std::getline(fileIn,lineFromFile);
      nlast=0;
}

The code in the listing resides in a loop that’s controlled by the state of the input file. The code reads the input file line by line using the std::getline(…) method and stores the line in the lineFromFile variable, a std::string type. The MXXMLWriter expects its inputs as Unicode strings; however, the sample uses ANSI strings throughout to make parsing the document easier. As a result, the code converts the ANSI strings it picks up from the input file into Unicode strings just before it passes them on to the MXXMLWriter.

The input file is text file that’s formatted as an element and element value pair per line; the sample produces an XML document made up of the elements and values enclosed in a RootElement element (see Listing 3).

Listing 3 – Sample text file

author, Essam Ahmed
topic, Using the Microsoft XML Parser...

author, Essam Ahmed
topic, Using the Microsoft XML Parser...

<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<RootElement>
  <author>Essam Ahmed</author>
  <topic>Using the Microsoft XML Parser...</topic>
</ RootElement>

The sample code prints its output on the screen; however, if you want to capture its output to a file, use the redirection character ( > ) to direct the output to a file.

Conclusion

This article introduced the MXXMLWriter, a component of the Microsoft XML 3.0 Parser, to create XML Documents. Using MXXMLWriter has several advantages over do-it-yourself implementations including on-going assurance that the XML documents your applications produce are conformant with current standards and without having to update your code as the standards change (you need to ensure that the XML Parser is up to date, but that is often just a matter of downloading and installing a newer version).

The code that accompanies this article includes the sample featured in this article, a sample text file, and the resulting XML file. Open the project’s file to review or change the code.

Essam Ahmed is the author of JScript .NET Programming (http://www.designs2solutions.com/jsnetprg) that includes an XML Primer and information on using XML with ADO.NET and Web Services.

Where to get more information:

Information on SAX:

Information on XML :

Downloads


Download demo project, source code – 383 Kb

More by Author

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Must Read