Introduction to LINQ, Part 2: LINQ to XML

In the first article of the series, I introduced the LINQ to Objects API, showing how the language integrated queries can be used for querying sequences with C#. In this second article, I will introduce LINQ to XML, formerly code-named XLinq.

The System.Xml.Linq Namespace

LINQ to XML allows querying XML data. A new namespace in .NET 3.5, System.Xml.Linq, contains classes for creating and querying XML trees, in the XPath style:

  • XName, used for namespace qualified identifies (Qname), used both as element and attribute name; XName objects are atomized, which means that if two XName objects have the same namespace and local name, they will actually share the same instance. This enables faster queries because, when filtering based on the name of elements or attributes, the comparison uses identity comparison, not value comparison. Checking if two references actually refer to the same object is much faster than comparing two strings.
  • XNode represents the abstract concept of a node in the XML tree. Acts as a base class for XComment, XContainer, XDocumentType, XProcessingInstruction, and XText.
  • XContainer, derived from XNode, offers features such as enumerating children of a node, or finding next and previous sibling; is the base class for XDocument and XElement.
  • XDocument represents an XML document, that contains the root level XML constructs, such as: a document type declaration (XDocumentType), one root element (XElement), zero or more comments objects (XComment).
  • XElement is the fundamental type for constructing XML data; it has an XName, optionally one or more attributes, and can also have content. An element is actually a container (derived from XContainer), that contains other XML nodes (XNode), such as XComment, XProcessingInstruction or XText.
  • XAttribute represents an XML attribute; in other words, a key-value pair associated with an XML element. Attributes are not nodes in the XML tree, thus not derived from XNode.
  • XComment, used for comments to the root node or as children of an element.

The following table shows how you can create different XML trees.

XElement root =
   new XElement("winner");
<winner />
XElement root =
   new XElement("winner",
      new XAttribute("year", 1999));
<winner year="1999" />
XElement root =
   new XElement("winner",
      new XAttribute("year", 1999),
      new XAttribute("country",
                     "England"));
<winner year="1999"
   country="England" />
XElement root =
   new XElement("winner",
      new XAttribute("year", 1999),
      new XAttribute("country",
                     "England"),
   "Manchester United");
<winner year="1999"
   country="England">
      Manchester United
</winner>
XElement root =
   new XElement("winner",
      new XAttribute("year", 1999),
      new XAttribute("country",
                     "England"),
      "Manchester United",
      new XComment("best final
                    ever"));
<winner year="1999"
   country="England">
      Manchester United
   <!-best final ever-->
</winner>
XElement root =
   new XElement("winners",
      new XElement("winner",
      new XElement("name",
         "Manchester United"),
      new XElement("country",
                   "England"),
      new XElement("year", 1999)));
<winners>
   <winner>
      <name>
         Manchester United
      </name>
      <country>
         England
      </country>
      <year>1999</year>
   </winner>
</winners>

Below is a hard-coded creation of an XML tree containing all the UEFA Champions League winners.

XElement root = new XElement("winners",
                    new XElement("winner",
                        new XElement("Name", "Barcelona"),
                        new XElement("Country", "Spania"),
                        new XElement("Year", 2006)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Liverpool"),
                        new XElement("Country", "Anglia"),
                        new XElement("Year", 2005)
                    ),
                    new XElement("winner",
                        new XElement("Name", "FC Porto"),
                        new XElement("Country", "Portugalia"),
                        new XElement("Year", 2004)
                    ),
                    new XElement("winner",
                        new XElement("Name", "AC Milan"),
                        new XElement("Country", "Italia"),
                        new XElement("Year", 2003)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Real Madrid"),
                        new XElement("Country", "Spania"),
                        new XElement("Year", 2002)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Bayern Munchen"),
                        new XElement("Country", "Germania"),
                        new XElement("Year", 2001)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Real Madrid"),
                        new XElement("Country", "Spania"),
                        new XElement("Year", 2000)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Manchester Utd."),
                        new XElement("Country", "Andlia"),
                        new XElement("Year", 1999)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Real Madrid"),
                        new XElement("Country", "Spania"),
                        new XElement("Year", 1998)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Borussia Dortmund"),
                        new XElement("Country", "Germania"),
                        new XElement("Year", 1997)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Juventus"),
                        new XElement("Country", "Italia"),
                        new XElement("Year", 1996)
                    ),
                    new XElement("winner",
                        new XElement("Name", "AFC Ajax"),
                        new XElement("Country", "Olanda"),
                        new XElement("Year", 1995)
                    ),
                    new XElement("winner",
                        new XElement("Name", "AC Milan"),
                        new XElement("Country", "Italia"),
                        new XElement("Year", 1994)
                    ),
                    new XElement("winner",
                        new XElement("Name", "Olympique de Marseille"),
                        new XElement("Country", "Franta"),
                        new XElement("Year", 1993)
                    )
                );

Introduction to LINQ, Part 2: LINQ to XML

Dynamically Creating the Winners List

Assume that you already have in the application the list of winners as a sequence on Winner:

/// <summary>
/// Encapsulates information about a UEFA Champions League winner
/// </summary>
class Winner
{
   string _name;
   string _country;
   int _year;

   public string Name
   {
      get { return _name; }
      set { _name = value; }
   }
   public string Country
   {
      get { return _country; }
      set { _country = value; }
   }

   public int Year
   {
      get { return _year; }
      set { _year = value; }
   }

   public Winner() { }
   public Winner(string name, string country, int year)
   {
      _name = name;
      _country = country;
      _year = year;
   }
}

/// <summary>
/// utility class with a single method returning the sequence
/// of winners
/// </summary>
class UCL
{
   /// <summary>
   /// returns a sequence of all UCL winners
   /// </summary>
   /// <returns>IEnumerable<Winner></returns>
   public static IEnumerable<Winner> GetWinners()
   {
      Winner[] winners =  {
         new Winner("Barcelona", "Spain", 2006),
         new Winner("Liverpool", "England", 2005),
         new Winner("FC Porto", "Portugal", 2004),
         new Winner("AC Milan", "Italy", 2003),
         new Winner("Real Madrid", "Spain", 2002),
         new Winner("Bayern Munchen", "Germany", 2001),
         new Winner("Real Madrid", "Spain", 2000),
         new Winner("Manchester Utd.", "England", 1999),
         new Winner("Real Madrid", "Spain", 1998),
         new Winner("Borussia Dortmund", "Germany", 1997),
         new Winner("Juventus", "Italy", 1996),
         new Winner("AFC Ajax", "Netherlands", 1995),
         new Winner("AC Milan", "Italy", 1994),
         new Winner("Olympique de Marseille", "France", 1993)
   };

      return winners;
   }
}

From this sequence of winners, you want to build an XML tree with the winners. One possible is approach enumerating over the sequence of winners and add new elements to the root:

IEnumerable<Winner> winners = UCL.GetWinners();
XElement root = new XElement("winners");

foreach (Winner w in winners)
{
    root.Add(new XElement("winner",
                 new XElement("Name", w.Name),
                 new XElement("Country", w.Country),
                 new XElement("Year", w.Year)));
}

Console.WriteLine(root.ToString());

When running the code, the output is:

<winners>
   <winner>
      <Name>Barcelona</Name>
      <Country>Spain</Country>
      <Year>2006</Year>
   </winner>
   <winner>
      <Name>Liverpool</Name>
      <Country>England</Country>
      <Year>2005</Year>
   </winner>
   <winner>
      <Name>FC Porto</Name>
      <Country>Portugal</Country>
      <Year>2004</Year>
   </winner>
   <winner>
      <Name>AC Milan</Name>
      <Country>Italy</Country>
      <Year>2003</Year>
   </winner>
   <winner>
      <Name>Real Madrid</Name>
      <Country>Spain</Country>
      <Year>2002</Year>
   </winner>
   <winner>
      <Name>Bayern Munchen</Name>
      <Country>Germany</Country>
      <Year>2001</Year>
   </winner>
   <winner>
      <Name>Real Madrid</Name>
      <Country>Spain</Country>
      <Year>2000</Year>
   </winner>
   <winner>
      <Name>Manchester Utd.</Name>
      <Country>England</Country>
      <Year>1999</Year>
   </winner>
   <winner>
      <Name>Real Madrid</Name>
      <Country>Spain</Country>
      <Year>1998</Year>
   </winner>
   <winner>
      <Name>Borussia Dortmund</Name>
      <Country>Germany</Country>
      <Year>1997</Year>
   </winner>
   <winner>
      <Name>Juventus</Name>
      <Country>Italy</Country>
      <Year>1996</Year>
   </winner>
   <winner>
      <Name>AFC Ajax</Name>
      <Country>Netherlands</Country>
      <Year>1995</Year>
   </winner>
   <winner>
      <Name>AC Milan</Name>
      <Country>Italy</Country>
      <Year>1994</Year>
   </winner>
   <winner>
      <Name>Olympique de Marseille</Name>
      <Country>France</Country>
      <Year>1993</Year>
   </winner>
</winners>

Introduction to LINQ, Part 2: LINQ to XML

Bringing Queries into Action

The same, however, can be achieved by using a query because one of the overloaded constructors of XElement takes a sequence of objects:

public XElement(XElement other);
public XElement(XName name);
public XElement(XStreamingElement other);
public XElement(XName name, object content);
public XElement(XName name, params object[] content);

Thus, you can rewrite the former code to

IEnumerable<Winner> winners = UCL.GetWinners();
XElement root = new XElement("winners",
                    from w in winners
                    select new XElement("winner",
                       new XElement("Name", w.Name),
                       new XElement("Country", w.Country),
                       new XElement("Year", w.Year)));

Saving and Loading XML Data

XElement has several overloads for saving, which take a string representing a file name, a TextWriter object, or an XmlWriter object.

public void Save(string fileName);
public void Save(TextWriter textWriter);
public void Save(XmlWriter writer);
public void Save(string fileName, SaveOptions options);
public void Save(TextWriter textWriter, SaveOptions options);

Saving an XML tree to a file is as simple as:

root.Save("winners.xml");

If root were the XElement from the latest code samples, the winners.xml file would contain the tree of UEFA Champions League winners listed above.

The opposite, loading an XML tree, is done either using the static method Parse() that takes a string with the XML data, or one of the overloads of the static method Load(), that can take a string representing a file name, a TextReader object, or an XmlReader object.

Creating a sequence of Winners based on the data read from a file "winners.xml" containing the UCL winners can be done like this:

var result = from e in XElement.Load("winners.xml").Elements("winner")
             select new Winner
                {
                   Name = (string)e.Element("Name"),
                   Country = (string)e.Element("Country"),
                   Year = (int)e.Element("Year")
                };

foreach (Winner w in result)
{
   Console.WriteLine("{0} {1}, {2}",
      w.Year, w.Name, w.Country);
}

XElement.Load creates an XML tree, the root being an XElement. The Elements() method returns a sequence of XElement objects with the specified name ("winner"). This is used as the source of the query, which projects a sequence of Winner. Method Element() returns the content of a child element with the specified name ("Name", "Country", or "Year" in the example).

More Examples

If you want to print only the names of the winners, you can run the following query:

// extracts only the names of the winners from the file
var result = from e in XElement.Load("winners.xml").Elements("winner")
             select (string)e.Element("Name");

foreach (var w in result)
{
    Console.WriteLine(w);
}

The output in this case would be:

Barcelona
Liverpool
FC Porto
AC Milan
Real Madrid
Bayern Munchen
Real Madrid
Manchester Utd.
Real Madrid
Borussia Dortmund
Juventus
AFC Ajax
AC Milan
Olympique de Marseille

Of course, you might be interested in the distinct names only, in which case you can apply the Distinct operator on the sequence returned by the query:

// extracts only the names of the winners from the file
var result = from e in XElement.Load("winners.xml").Elements("winner")
             orderby (string)e.Element("Name")
             select (string)e.Element("Name");

// creates a sequence of distinct names
var result2 = Enumerable.Distinct(result);

foreach (var w in result2)
{
   Console.WriteLine(w);
}

The new output is in this case:

AC Milan
AFC Ajax
Barcelona
Bayern Munchen
Borussia Dortmund
FC Porto
Juventus
Liverpool
Manchester Utd.
Olympique de Marseille
Real Madrid
Note: Remember (from the first article) that, as long as you do not iterate over the result, the source is not iterated. Thus, in the absence of the foreach statement, the query would not be executed.

Performance

As I was writing in my blog, I run queries on a quite large XML file (about 100MB), extracting various sets of data. All queries were performed in approximative 3.5 seconds. This made me draw two conclusions:

  • LINQ is very performant; extracting 90% of data from a 100MB file, in three separate runs, is done in less than 12 seconds; this is equivalent of extracting data from a file of 300MB. I find this very swift.
  • There was't a distinctive difference between extracting 0.5MB or 50MB.

See Also



About the Author

Marius Bancila

Marius Bancila is a Microsoft MVP for VC++. He works as a software developer for a Norwegian-based company. He is mainly focused on building desktop applications with MFC and VC#. He keeps a blog at www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++/VC++ programmers.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • QA teams don't have time to test everything yet they can't afford to ship buggy code. Learn how Coverity can help organizations shrink their testing cycles and reduce regression risk by focusing their manual and automated testing based on the impact of change.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds