Using Open XML Schema with .NET

A key tenet of service-oriented architecture is that applications should communicate in a decoupled fashion, using messaging as a communication pattern. Decoupling the various components of a system using messages rather than relying on strongly-typed objects enables your system components to evolve and scale with much less effort. In theory, small extensions to messages can be propagated throughout a system, without requiring that all components of the system be recompiled.

Visual Studio .NET simplifies the use of XML messaging in your applications. For example, it encourages you to use XML Web services by greatly reducing the number of knobs you're required to tweak to create or consume a web service. Visual Studio also makes it very easy to map .NET classes to XML schema, as well as serialize objects to and from XML.

In fact, serializing an instance of a class into an XML document using C# requires just a few lines of code. The following function will map any XML-serializable class into a MemoryStream that contains an XML document:

Stream SerializeThingToXmlStream(object thing)
{
   MemoryStream ms          = new MemoryStream();
   XmlSerializer serializer = new XmlSerializer(thing.GetType());
   serializer.Serialize(ms, thing);
   ms.Seek(0, SeekOrigin.Begin);
   return ms;
}

Although similar code can be used to reverse this process and easily reconstitute an object from an XML document, this default usage pattern does not take advantage of the flexibility that's a core part of XML. One of the advantages of using XML is the ability to define open elements where extension can occur. By defining open elements in your XML documents and XML schemas, you can take advantage of two complementary programming models in your systems:

  • Strongly-typed programming languages, such as C#, Visual Basic, and Eiffel inside individual applications and components
  • Dynamically-typed XML message documents that are structured as needed, and can evolve as necessary

Although strongly typed programming languages are the ideal tools for building reliable components and applications, modern systems have a need to communicate using messages that are flexible and can evolve over time. For this reason, it's desirable to decouple the system components from their messages. The decoupled architecture often promoted by SOA advocates enables a system to evolve by focusing on communicating with messages rather than tightly coupled objects.

To examine how you can manage the decoupling process, let's start with an example that doesn't use open schema elements, resulting in an inflexible relationship between XML documents and associated .NET classes.

Consider a simple XML document that has some minimal information about an entry in an order-tracking system.

<?xml version="1.0"?>
<Shipment xmlns="http://schema.servergeek.com/2004/7">
   <OrderNo>1234567</OrderNo>
   <Location>
      <Addr>One Acme Way</Addr>
      <City>BoogieTown</City>
      <State>CA</State>
   </Location>
</Shipment>

This simple document tracks the order number and some minimal (actually incomplete) address information. An example of a C# class that maps to this XML document is shown below.

[XmlRoot(Namespace=App.TargetNamespace)]
public class Shipment
{
   string   _orderNo = string.Empty;
   Location _location = new Location();
   public string OrderNo
   {
      get { return _orderNo; }
      set { _orderNo = value; }
   }
   public Location Location
   {
      get { return _location; }
      set { _location = value; }
   }
}

[XmlRoot(Namespace=App.TargetNamespace)]
public class Location
{
   string _addr  = string.Empty;
   string _city  = string.Empty;
   string _state = string.Empty;
   public string Addr
   {
      get { return _addr; }
      set { _addr = value; }
   }
   public string City
   {
      get { return _city; }
      set { _city = value; }
   }
   public string State
   {
      get { return _state; }
      set { _state = value; }
   }
}

The Shipment class above is similar to the class that the tools included with Visual Studio will create when asked to created classes that serialize into specific XML documents.

There's an interesting aspect to the relationship between XML documents and CLR classes. By default, there is no schema validation performed by the .NET Framework, and no schema is even required to exist. In fact, when serializing an XML document into a CLR class (such as my earlier definition of the Shipment class), the .NET Framework will only attempt to make its best effort when mapping data from the document into the CLR object.

This best-effort approach causes some behavior that you should be aware of. Consider an XML fragment that is very similar to my first Shipment XML document, except that the Location node includes an extra element, named Zip, as shown below:

<?xml version="1.0"?>
<Shipment xmlns:xsd=http://www.w3.org/2001/XMLSchema
          xmlns="http://schema.servergeek.com/2004/7">
   <OrderNo>1234567</OrderNo>
   <Location>
      <Addr>One Acme Way</Addr>
      <City>BoogieTown</City>
      <State>CA</State>
      <Zip>92653</Zip>
   </Location>
</Shipment>

A common way to deserialize an XML document is to use a function like this one, which deserializes an XML file at a specified path:

object DeserializeXmlFile(string path, Type type)
{
   using(FileStream fs = new FileStream(path, FileMode.Open))
   {
      XmlSerializer serializer = new XmlSerializer(type);
      return serializer.Deserialize(fs);
   }
}

So what happens when you deserialize an XML document with extra elements? If you come from the land of strongly typed languages, you might be surprised to learn that no errors or warnings are generated at runtime, although the XmlSerializer class will generate events when unexpected elements and attributes are encountered. Code that handles the UnknownElement and UnknownAttribute events is shown in the following code snippet.

object DeserializeXmlFile(string path, Type type)
{
   using(FileStream fs = new FileStream(path, FileMode.Open))
   {
      XmlSerializer serializer = new XmlSerializer(type);
      serializer.UnknownAttribute +=
                 new XmlAttributeEventHandler(UnknownAttribute);
      serializer.UnknownElement +=
                 new XmlElementEventHandler(UnknownElement);
      return serializer.Deserialize(fs);
   }
}

void UnknownAttribute(object sender, XmlAttributeEventArgs e)
{
   Trace.WriteLine("Unknown Attribute: " + e.Attr.OuterXml);
}

void UnknownElement(object sender, XmlElementEventArgs e)
{
   Trace.WriteLine("Unknown Element: " + e.Element.Name);
}

Although this type of code is useful for logging that an unexpected element or attribute was encountered for debugging purposes, in practice it's difficult to recover this data for further use in a message processing pipeline.

Using Open XML Schema with .NET

Using Open Elements and Attributes

When an application encounters an XML document that contains some unexpected elements, there are multiple outcomes that you can choose from. If your only requirements are that debugging information be logged during deserialization, you can simply handle the UnknownElement and UnknownAttribute events, as shown earlier. But, what if you actually need to preserve the contents and structure of the data from the XML document so that it can be passed to another system component? The .NET Framework includes two attributes that enable you to capture unknown XML elements and attributes:

  • XmlAnyElementAttribute is used to control how unknown elements are stored in an object.
  • XmlAnyAttributeAttribute is used to control how unknown attributes are stored in an object.

XmlAnyElementAttribute is attached to a field, property, parameter, or return value and is essentially a "wild card" that is capable of converting to and from XML elements during serialization. You'll probably want to use it with an array of XmlNode objects, becaise each unknown element from the XML document will be placed into its own XmlNode object. If you use the attribute with a single XmlNode object, you'll lose data if multiple unknown elements are received. The XmlAnyElement attribute is typically used like this:

XmlNode[] _openElements = null;
[XmlAnyElement()]
public XmlNode[] OpenElements
{
   get { return _openElements; }
   set { _openElements = value; }
}

XmlAnyAttribute can be used with an array of XmlAttribute or XmlNode (it can't be used with single objects of either type). It's typically used much like the XmlAnyElement, as shown in the code below:

XmlNode[] _openAttributes = null;
[XmlAnyAttribute()]
public XmlNode[] OpenAttributes
{
   get { return _openAttributes; }
   set { _openAttributes = value; }
}

When you use XmlAnyElement and XmlAnyAttribute attributes, the serializer will not generate UnknownElement and UnknownAttribute events.

Validating with Schema and XmlAnyElement

Now, let's consider how open elements affect validation. Many systems use XML schema to validate incoming XML documents. For a large number of applications, the default non-validated behavior is undesirable, for some of the same reasons that most developers prefer strongly typed programming languages. When messages arrive at a server application, these developers want to know whether the data arrives with unexpected formatting or content. The standard answer in this case is to use XML schema, which will reject XML documents that are presented with an unrecognized structure.

However, blindly using XML simply exchanges one problem for another. As discussed earlier, XML documents that are agile offer advantages over fixed structures. When XML is used as the common language for data exchange in an enterprise, XML schema can cause undesirable rigidity in the communication infrastructure—if documents don't precisely match the expected structure, they're rejected.

If a typical schema (such as one inferred from an existing XML instance document) is used to validate messages, the schema can become a brake on system evolution. Instead of avoiding schema altogether, or attempting to update all components that use a particular schema simultaneously, you can use an XML schema that defines open elements and attributes to create a schema that has just enough structure to validate specific behavior, while maintaining flexibility for extension.

Using an XML schema with open elements and attributes enables you to balance consistency with extendibility. Open elements and attributes create expansion points that enable clients to send XML documents that follow an updated structure to a server, without the need to update all edges of a system simultaneously. It also enables a client to use XML schema that is shared among multiple applications—if a client is updated prior to the server, the server can simply ignore the updated elements for the purposes of schema validation.

The process begins by defining open elements in your schema, using xs:any type to indicate the part of the XML element that's open for undefined elements, as shown in the following code.

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="order" targetNamespace="http://schema.servergeek.com/
                                              2004/7"
                      xmlns:bks="http://schema.servergeek.com/2004/7"
                      xmlns:xs="http://www.w3.org/2001/XMLSchema"
                      elementFormDefault="qualified"
                      attributeFormDefault="qualified">
   <xs:element name="Shipment">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="OrderNo" type="xs:string"/> 
           <xs:element name="Location">
              <xs:complexType>
                 <xs:sequence>
                   <xs:element name="Addr" type="xs:string" />
                     <xs:element name="City" type="xs:string" />
                     <xs:element name="State" type="xs:string" />
                     <xs:sequence>
                        <xs:any namespace="##any" minOccurs="0"
                                maxOccurs="unbounded"
                                processContents="skip" />
                     </xs:sequence>
                  </xs:sequence>
              </xs:complexType>
            </xs:element>
            <xs:sequence>
              <xs:any namespace="##any" 
                      minOccurs="0" maxOccurs="unbounded"
                      processContents="skip" />
            </xs:sequence>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xs:schema>

The definition of a sequence of zero or more xs:any elements in this schema enables any number of additional XML nodes to be included in the XML document inside a Location element as well as immediately after the element.

In this example, the namespace for the open elements are declared as '##any', which is actually the default value. Alternatively, you can supply the specific namespace that is allowed, or another predefined namespace token:

  • ##other The XML must be from any namespace other than the target namespace is allowed.
  • ##local The XML must be must not be in a namespace.
  • ##targetNamespace The XML must be in the target namespace.

A specific allowed namespace can also be provided for this attribute.

The processContents attribute in the xs:any element is set to 'skip' in this example, which instructs a schema validator to ignore the nodes that are part of the open element. A complete list of options follows:

  • lax: Enforce schema if a namespace is declared and the validator has access to the schema definition.
  • skip: No schema enforcement.
  • strict: Always enforce schema for this open element.

Open sets of attributes can also be defined using a similar syntax, as shown below:

<xs:schema id="cars"
           targetNamespace="http://schema.servergeek.com/2004/7"
           xmlns:bks="http://schema.servergeek.com/2004/7"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           attributeFormDefault="qualified">
   <xs:element name="Car">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="Color" type="xs:string" />
            ...
            </xs:sequence>
         <xs:anyAttribute namespace='##any'
                          processContents='skip' />
      </xs:complexType>
   </xs:element>
</xs:schema>

Open attributes follow the same rules as elements with regard to namespace and processContents. This schema fragment defines an element named Car that is allowed to have additional attributes without namespace restriction or schema enforcement.

Taken to an extreme, consider a gateway component that only tests a Shipment XML message for the presence of an OrderNo element. The schema used by this component could be reduced to something like this:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="order"
           targetNamespace="http://schema.servergeek.com/2004/7"
           xmlns:bks="http://schema.servergeek.com/2004/7"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           attributeFormDefault="qualified">
   <xs:element name="Shipment">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="OrderNo" type="xs:string"/>
              <xs:sequence>
                <xs:any namespace="##any" minOccurs="0"
                        maxOccurs="unbounded"
                        processContents="skip" />
              </xs:sequence>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xs:schema>

This enables documents to have additional elements that can pass as schema-valid until an updated schema is provided. It's also useful when performing SOA-like validation, where different actors have diverse validation needs. Components and applications have up-to-date knowledge of message requirements. Components that serve as intermediaries can perform coarser validation on those portions of the message schema that are required by the component.

Summary

In this article, I've discussed two complimentary approaches to using XML messaging in enterprise applications. The XmlAnyElement and XmlAnyAttribute attributes are used to capture open XML content in your .NET classes. The xs:any and xs:anyAttribute schema elements are used to define expansion points and enable just enough schema validation for system components.

More Information

More information about XML Schema is available at the W3C's XML Schema page at: http://www.w3.org/XML/Schema

About the Author

Mickey Williams is a Microsoft C# MVP, and the author of Microsoft Visual C# .NET Core Reference for Microsoft Press. He works as a Principal Consultant for Neudesic, LLC in Southern California, building service-oriented applications for enterprise customers. His weblog can be found at http://www.servergeek.com/blogs/mickey.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As businesses have grown in size and global reach, emerging technologies such as cloud computing and virtualization have appeared to help companies effectively connect and grow. However, the networking strategies and infrastructures that keep organizations connected have often remained in the past. Now, new strategies that leverage global connectivity and locations exist to provide a more flexible and cost-effective alternative to traditional networking systems. This Aberdeen report analyzes how top performing …

  • Instead of only managing projects organizations do need to manage value! "Doing the right things" and "doing things right" are the essential ingredients for successful software and systems delivery. Unfortunately, with distributed delivery spanning multiple disciplines, geographies and time zones, many organizations struggle with teams working in silos, broken lines of communication, lack of collaboration, inadequate traceability, and poor project visibility. This often results in organizations "doing the …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds