Reading XML Files with the XmlTextReader Class

In the previous article, I presented the XmlTextWriter class as a noncached, forward-only means of writing XML data. In this article, you'll look at the reciprocal class for reading XML data—the XmlTextReader class. The XmlTextReader class is also a sequential, forward-only class, meaning that you cannot dynamically search for any node—you must read every node from the beginning of the file until the end (or until you've reached the desired node). Therefore, this class is most useful in scenarios where you're dealing with small files or the application requires the reading of the entire file. Also, note that the XmlTextReader class does not provide any sort of XML validation; this means that the class assumes that the XML being read is valid. In this week's article, I'll illustrate the following aspects of using the XmlTextReader class:

  • Reading and parsing XML nodes
  • Retrieving names and values

Reading and Parsing XML Nodes

As mentioned, the XmlTextReader does not provide a means of randomly reading a specific XML node. As a result, the application reads each node of an XML document, determining along the way whether the current node is what is needed. This is typically accomplishd by constructing an XmlTextReader object and then iteratively calling—within a loop—the XmlTextReader::Read method until that method returns false. The code will generally look like the following:

// skeleton code to enumerate an XML file's nodes
try
{
   XmlTextReader* xmlreader = new XmlTextReader(fileName);
   while (xmlreader->Read())
   {
      // parse based on NodeType
   }
}
catch (Exception* ex)
{
}
__finally
{
}

As each call to the Read method will read the next node in the XML file, your code must be able to distinguish between node types. This includes everything from the XML file's opening declaration node to element and text nodes and even includes special nodes for comments and whitespace. The XmlTextReader::NodeType property is an enum of type XmlNodeType that indicates the exact type of the currently read node. Table 1 lists the different types defined by the XmlNodeType type.

Table 1 has been abbreviated to show only those XmlNodeType values that are currently used by the NodeType property.

Table 1: XmlNodeType Enum Values

XmlNodeType Value Description
Attribute An attribute defined within an element
CDATA Identifies a block of data that will not parsed by the XML reader
Comment A plain-text comment
DocumentType Document type declaration
Element Represents the beginning of an element
EndElement The end element tag—for example, </author>
EntityReference An entity reference
None The state the reader is in before Read has been called
ProcessingInstruction An XML processing instruction
SignificantWhitespace White space between markup tags in a mixed content model
Text The text value of an element
Whitespace White space between tags
XmlDeclaration The XML declaration node that starts the file/document

Now that you see how to discern node types, look at a sample XML file and a code snippet that will read and output to the console all found nodes within that file. This will illustrate what the XmlTextReader returns to you with each Read and what you should look for in your code as you enumerate through the file's nodes. Here first is a simple XML file:

<?xml version="1.0" encoding="us-ascii"?>
<!-- Test comment -->
<emails>
   <email language="EN" encrypted="no">
      <from>Tom@ArcherConsultingGroup.com</from>
      <to>BillG@microsoft.com</to>
      <copies>
         <copy>Krista@ArcherConsultingGroup.com</copy>
      </copies>
      <subject>Buyout of Microsoft</subject>
      <message>Dear Bill...</message>
   </email>
</emails>

Now for the code. The following code snippet opens an XML file and—within a while loop—enumerates all nodes found by the XmlTextReader. As each node is read, its NodeType, Name, andValue properties are output to the console:

// Loop to enumerate and output all nodes of an XML file
String* format = S"XmlNodeType::{0,-12}{1,-10}{2}";

XmlTextReader* xmlreader = new XmlTextReader(fileName);
while (xmlreader->Read())
{
   String* out = String::Format(format,
                                __box(xmlreader->NodeType),
                                xmlreader->Name,
                                xmlreader->Value);
   Console::WriteLine(out);
}

Looking at the file and code listings, you should easily be able to see how each of the lines in Figure 1 were formed.

Figure 1: Enumerating all the nodes of an XML file

Reading XML Files with the XmlTextReader Class

Retrieving Names and Values

Looking at Figure 1, you can see that, to retrieve the value for a given element, you need to look programatically for an node of type XmlNodeType::Text. However, here's the problem. Once you've reached that node, you no longer know the element name for which that text applies becausr that part was read the previous time through the loop. To illustrate what I mean, locate the from element in Figure 1. During that iteration of the loop, what you know is that the NodeType value is XmlNodeType::Element and that its Name property is "from". However, you won't know its value until the next time through the loop when you read the next node, which is the XmlNodeType::Text node for that element. At that point, you can then use the reader's Value property to get the element's text value.

Therefore, there are two ways to read the names and values of the elements you're code needs. One way is to keep track of the current element as you're enumerating the file. Then, when you reach a text node, you'll know for which element the text applies. Here's a code snippet to illustrate how to do that:

// Loop to read the names and values of all elements
String* format = S"{0,-20}{1}";
String* currentElement;

XmlTextReader* xmlreader = new XmlTextReader(fileName);
while (xmlreader->Read())
{
   if (xmlreader->NodeType == XmlNodeType::Element)
   {
    currentElement = xmlreader->Name;
   }
   else if (xmlreader->NodeType == XmlNodeType::Text)
   {
      String* out = String::Format(format,
                                   currentElement,
                                   xmlreader->Value);
      Console::WriteLine(out);
   }
}

Running this code against the test XML file shown earlier yields the results shown in Figure 2 where only the elements are displayed and each element name is properly associated with its value.

[XmlTextReader2.jpg]

Figure 2: Using two reads to get each element's name and value

Keeping in mind that the XmlTextReader is a forward-only reader, there are also methods to tell it what to read next. For example, the XmlTextReader::ReadString method will read the entire contents of the current element or text node into a String object. Here's a loop that illustrates using the ReadString method:

// Loop to read each element's string value
String* format = S"{0,-20}{1}";

XmlTextReader* xmlreader = new XmlTextReader(fileName);
while (xmlreader->Read())
{
   if (xmlreader->NodeType == XmlNodeType::Element)
   {
      String* out = String::Format(format,
                                   xmlreader->Name,
                                   xmlreader->ReadString());
      Console::WriteLine(out);
   }
}

While the ReadString method would seem to be much cleaner than the first approach (of using two distinct reads to obtain the element's name and value), take a look at Figure 3.

[XmlTextReader3.jpg]

Figure 3: Using the ReadString method

As you can see, with this latest modification you now have several blank nodes. This is because the code is no longer looking for an element node followed by a text node—which would indicate an element with text data. Now, the code is simply stating give me the entire string representing each element. In some cases—such as the <emails> node, that node doesn't contain data. Therefore, you need to be careful in knowing what your data is before calling methods such as ReadString.

In most cases where you're looking for a the values of an element, you know the name of that element. Therefore, you would simply insert conditional logic into your code to only call the ReadString method for the desired elements:

if (0 == String::Compare(xmlreader->Name, S"subject", true))
...

Looking Ahead

In this article, you learned how to enumerate XML files using the XmlTextReader class. You also saw code snippets detailing how to parse for specific node types and two different methods for reading the names and values of element nodes. In the next article, I'll cover three more important issues regarding the XmlTextReader class: skipping to content, ignoring whitespace, and reading attributes.



About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: November 13, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT APIs can be a great source of competitive advantage. The practice of exposing backend services as APIs has become pervasive, however their use varies widely across companies and industries. Some companies leverage APIs to create internal, operational and development efficiencies, while others use them to drive ancillary revenue channels. Many companies successfully support both public and private programs from the same API by varying levels …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds