Introduction to Using the XML DOM from Visual C++

.

Assumptions About the Reader

This article assumes that you are familiar with the basics of what XML is and what it can be used for. If you are new to XML, I would suggest reading one of the many fine tutorials on the subject first and then returning to this document.

Introducing the XML Document Object Model (DOM)

The XML Document Object Model, or DOM, is a very powerful and robust programmatic interface that not only enables you to programatically load and parse an XML file, or document, it also can be used to traverse XML data. Using certain object in the DOM, you can even manipulate that data and then save your changes back to the XML document. A full and comprehensive look at all the DOM's functionality would be impossible in the space provided here. However, in this article, we'll hit on the hight notes of using the DOM to load an XML document and to iterate through its elements.

The key to understanding how to use the DOM is realizing that the DOM exposes (through its COM interface) XML documents as a hierarchical tree of nodes. As an example, take a look at the following sample XML document.

<?xml version="1.0"?>
<autos>
  <manufacturer name="Chevrolet">
    <make name="Corvette">
      <model>2000 Convertible</model>
      <price currency="usd">60,000</price>
      <horsePower>420</horsePower>
      <fuelCapacity units="gallons">18.5</fuelCapacity>
    </make>
  </manufacturer>
  <manufacturer name="Mazda">
    <make name="RX-7">
      <model>test model</model>
      <price currency="usd">30,000</price>
      <horsePower>350</horsePower>
      <fuelCapacity units="gallons">15.5</fuelCapacity>
    </make>
  </manufacturer>
</autos>
The DOM would interpret this document as follows:
  • <Autos> - This is a NODE_ELEMENT (more on this later) and is referred to as the documentElement
  • <Manufacturer>, <Make>, <Model>, <Price> <HorsePower> and <FuelCapacity> - Each one of these is also a NODE_ELEMENT. However, please note that only the top level NODE_ELEMENT, or root node is referred to as the documentElement.
  • currency="usd", units="gallons"- When a NODE_ELEMENT contains an attribute/value pair like this, the value is referred to as a NODE_TEXT
As you will see shortly, there a number of COM components that part of the XML DOM. Here's a list of the some of the more interesting components and their purpose.
  • XMLDOMDocument - The top node of the XML document tree
  • XMLDOMNode - This represents any single node in the XML document tree.
  • XMLDOMNodeList - This is the collection of all XMLDOMNode objects
  • XMLDOMNamedNodeMap
  • - The collection of all the XML document tree attributes

Accessing IE5's XML Support from Visual C++

I'm a firm believer in a tutorial-style, "let's walk through the code" approach so let's get started seeing just what the COM can do for us by cranking up the Visual C++ development environment and writing some code to load an XML document and navigate through its elements.

Create the Visual C++ Project

While we can do this utilizing MFC or ATL, we'll keep things simple (for me at least :) and use MFC. Therefore, perform the following steps to create the test project and incorporate IE5 XML support into your application.
  1. Create a new Visual C++ project called XMLDOMFromVC.
  2. In the MFC AppWizard, define the project as being a dialog-based application.
  3. Once the AppWizard has completed its work, add a call to initialize OLE support by inserting a call to ::AfxOleInit in the application class' InitInstance function. Assuming you named your project the same as mine, your code should now look like this (with the AfOleInit call highlighted here):
    BOOL CXMLDOMFromVCApp::InitInstance()
    {
     AfxEnableControlContainer();
    
     // .. other code
    
     ::AfxOleInit();
    
     // Since the dialog has been closed, return FALSE so that we exit the
     //  application, rather than start the application's message pump.
     return FALSE;
    }
    
  4. At this point, you'll need to import the Microsoft XML Parser typelib (OLE type library). The simplest way to do this is to use the C++ #import directive. Simply open your project's stdafx.h file and add the following lines before the file's closing #endif directive.
    #import <msxml.dll> named_guids
    using namespace MSXML;
    
  5. At this point, we can start declaring some variable to use with the DOM. Open your dialog class' header file (XMLDOMFromVCDlg.h) and add the following smart pointer member variables where the IXMLDOMDocumentPtr is the pointer to the XML document itself and the IXMLDOMElement is a pointer to the XML document root (as explained above).
    IXMLDOMDocumentPtr m_plDomDocument;
    IXMLDOMElementPtr m_pDocRoot;
    
  6. Once you've declared the XML smart pointers, insert the following code in your dialog class' OnInitDialog member function (just before the return statement). This code simply initializes the COM runtime and sets up your XML document smart pointer (m_plDomDocument).
    // Initialize COM
    ::CoInitialize(NULL);
    
    HRESULT hr = m_plDomDocument.CreateInstance(CLSID_DOMDocument);
    if (FAILED(hr))
    {
     _com_error er(hr);
     AfxMessageBox(er.ErrorMessage());
     EndDialog(1);
    }
    

Loading an XML Document

Now that you've done the preliminary work for include XML support into your Visual C++ applications, let's do something useful like actually loading an XML document. To do that, simply add the following code to your dialog (just after the initialization code entered above). I've sprinkled comments through the code to explain what I'm doing each step of the way. I would recommend putting this code into your dialog's OnInitDialog member function.
// specify xml file name
CString strFileName ("XMLDOMFromVC.xml");

// convert xml file name string to something COM can handle (BSTR)
_bstr_t bstrFileName;
bstrFileName = strFileName.AllocSysString();

// call the IXMLDOMDocumentPtr's load function to load the XML document
variant_t vResult;
vResult = m_plDomDocument->load(bstrFileName);
if (((bool)vResult) == TRUE) // success!
{
 // now that the document is loaded, we need to initialize the root pointer
 m_pDocRoot = m_plDomDocument->documentElement;
 AfxMessageBox("Document loaded successfully!");
}
else
{
 AfxMessageBox("Document FAILED to load!");
}
Don't believe it's that easy? Add the following call to have the contents of your entire XML document displayed in a message box.
AfxMessageBox(m_plDomDocument->xml);
Now, build and run the application and you should see results similar to Figure 1.


Loading and displaying an XML document can be done from Visual C++ with just a few lines of code using the DOM.

Ok. Ok. This doesn't really count as reading through an XML document, but I wanted to show you that you had successfully loaded a document and that you can easily get the entire document's contents with a single line of code. In the next section, we'll see how to manually iterate through XML elements.

Iterating Through an XML Document

In this section, we'll learn about a couple of method and properties that you'll use quite often when iterating through a document's elements: IXMLDOMNodePtr::firstChild and IXMLDOMNodePtr::nextSibling.

The following reentrant function shows a way by which you can do this quite easily. In fact, if you insert this code into the dialog's OK button handler it will display each element in your document:

void CXMLDOMFromVCDlg::OnOK() 
{
 // send the root to the DisplayChildren function
 DisplayChildren(m_pDocRoot);
}

void CXMLDOMFromVCDlg::DisplayChildren(IXMLDOMNodePtr pParent)
{
 // display the current node's name
 DisplayChild(pParent);

 // simple for loop to get all children
 for (IXMLDOMNodePtr pChild = pParent->firstChild;
      NULL != pChild;
      pChild = pChild->nextSibling)
 {
  // for each child, call this function so that we get 
  // its children as well
  DisplayChildren(pChild);
 }
}

void CXMLDOMFromVCDlg::DisplayChild(IXMLDOMNodePtr pChild)
{
 AfxMessageBox(pChild->nodeName);
}
If you were to build and run the project at this point, you would definitely notice something peculiar. The first few message boxes will appear as you might expect. The first one displaying the value "autos", followed by by "manufacturerer" and then "make" and finally "model". However, at that point (after the message box displaying the value "Model") things will get a little strange. Instead of a message box displaying the value "price", the value "#text" will be displayed! The reason for this is simple.

Let's look at an excerpt from the XML document:

  ...
  <manufacturer name="Chevrolet">
    <make name="Corvette">
      <model>2000 Convertible</model>
      <price currency="usd">60,000</price>
      <horsePower>420</horsePower>
      <fuelCapacity units="gallons">18.5</fuelCapacity>
    </make>
  </manufacturer>
  ...
As you can see in the highlighted line above, a value succeeds the model tag, These "values" are still treated as nodes in XML when using the IXMLDOMNodePtr::firstChild and IXMLDOMNodePtr::nextSibling methods. Therefore, how do you know what type of node you have?

By using the IXMLDOMNodePtr::nodeType property. Simply modify your dialog's CXMLDOMFromVCDlg::DisplayChild member function based on the highlighted portions below. When you've done that and run the code, you will see the expected values instead of the literal "#text".

void CXMLDOMFromVCDlg::DisplayChild(IXMLDOMNodePtr pChild)
{
 if (NODE_TEXT == pChild->nodeType)
 {
  AfxMessageBox(pChild->text);
 }
 else
 {
  AfxMessageBox(pChild->nodeName);
 }
}
You no doubt also noted the "magic" constant used above (NODE_TEXT). All the node types are defined with an enum in the msxml.tlh file that was generated with the #import directive you used earlier. This enum structure is listed below:
enum tagDOMNodeType
{
    NODE_INVALID = 0,
    NODE_ELEMENT = 1,
    NODE_ATTRIBUTE = 2,
    NODE_TEXT = 3,
    NODE_CDATA_SECTION = 4,
    NODE_ENTITY_REFERENCE = 5,
    NODE_ENTITY = 6,
    NODE_PROCESSING_INSTRUCTION = 7,
    NODE_COMMENT = 8,
    NODE_DOCUMENT = 9,
    NODE_DOCUMENT_TYPE = 10,
    NODE_DOCUMENT_FRAGMENT = 11,
    NODE_NOTATION = 12
};

Summary

In this article, you discovered the XML DOM and learned how to access its features from Visual C++ / COM. The demo we built illustrated the following basic DOM functions:
  • Loading an XML document
  • Iterating through a document's nodes
  • Determining a node's type
  • Displaying NODE_TEXT node values
There is obviously much more to DOM than what you've seen here, but hopefully what you've learned will whet your appetite to dig into the documenation and to see all the great things you can do with XML documents using the DOM.

Downloads

Download demo project - 15 Kb


About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.