XSD Tutorial: XML Schemas For Beginners

XSD Tutorial, Part 1 of 5: Elements and Attributes

This article gives a basic overview of the building blocks underlying XML Schemas and how to use them. It covers:

  • Schema Overview
  • Elements
  • Cardinality
  • Simple Types
  • Complex Types
  • Compositors
  • Reuse
  • Attributes
  • Mixed Element Content

Overview

First, look at what an XML schema is. A schema formally describes what a given XML document contains, in the same way a database schema describes the data that can be contained in a database (table structure, data types). An XML schema describes the coarse shape of the XML document, what fields an element can contain, which sub elements it can contain, and so forth. It also can describe the values that can be placed into any element or attribute.

A Note About Standards

DTD was the first formalized standard, but is rarely used anymore.

XDR was an early attempt by Microsoft to provide a more comprehensive standard than DTD. This standard has pretty much been abandoned now in favor of XSD.

XSD is currently the de facto standard for describing XML documents. There are two versions in use, 1.0 and 1.1, which are on the whole the same. (You have to dig quite deep before you notice the difference.) An XSD schema is itself an XML document; there is even an a XSD schema to describe the XSD standard.

There are also a number of other standards, but their take up has been patchy at best.

The XSD standard has evolved over a number of years, and is controlled by the W3C. It is extremely comprehensive, and as a result has become rather complex. For this reason, it is a good idea to make use of design tools when working with XSDs (See XML Studio, a FREE XSD development tool), also when working with XML documents programmatically XML Data Binding is a much easier way to manipulate your documents (a object-oriented approach; see Liquid XML Data Binding).

The remainder of this tutorial guides you through the basics of the XSD standard, things you should really know even if you’re using a design tool like Liquid XML Studio.

Elements

Elements are the main building block of any XML document; they contain the data and determine the structure of the document. An element can be defined within an XML Schema (XSD) as follows:

<xs:element name="x" type="y"/>

An element definition within the XSD must have a name property; this is the name that will appear in the XML document. The type property provides the description of what can be contained within the element when it appears in the XML document. There are a number of predefined types, such as xs:string, xs:integer, xs:boolean or xs:date (see the XSD standard for a complete list). You also can create a user-defined type by using the <xs:simple type> and <xs:complexType> tags, but more on these later.

If you have set the type property for an element in the XSD, the corresponding value in the XML document must be in the correct format for its given type. (Failure to do this will cause a validation error.) Examples of simple elements and their XML are below:

Sample XSD Sample XML
<xs:element name="Customer_dob"
            type="xs:date"/>
<Customer_dob>
    2000-01-12T12:13:14Z
</Customer_dob>
<xs:element name="Customer_address"
            type="xs:string"/>
<Customer_address>
   99 London Road
</Customer_address>
<xs:element name="OrderID"
            type="xs:int"/>
<OrderID>
   5756
</OrderID>
<xs:element name="Body"
            type="xs:string"/>
<Body>
   (a type can be defined as
   a string but not have any
   content; this is not true
   of all data types, however).
</Body>

The previous XSD definitions are shown graphically in Liquid XML Studio as follows:

The value the element takes in the XML document can further be affected by using the fixed and default properties.

Default means that, if no value is specified in the XML document, the application reading the document (typically an XML parser or XML Data binding Library) should use the default specified in the XSD.
Fixed means the value in the XML document can only have the value specified in the XSD.
For this reason, it does not make sense to use both default and fixed in the same element definition. (In fact, it’s illegal to do so.)

<xs:element name="Customer_name" type="xs:string" default="unknown"/>
<xs:element name="Customer_location" type="xs:string" fixed=" UK"/>

Cardinality

Specifying how many times an element can appear is referred to as cardinality, and is specified by using the minOccurs and maxOccurs attributes. In this way, an element can be mandatory, optional, or appear many times. minOccurs can be assigned any non-negative integer value (for example: 0, 1, 2, 3… and so forth), and maxOccurs can be assigned any non-negative integer value or the string constant “unbounded”, meaning no maximum.

The default values for minOccurs and maxOccurs is 1. So, if both the minOccurs and maxOccurs attributes are absent, as in all the previous examples, the element must appear once and once only.

Sample XSD Description
<xs:element name="Customer_dob"
            type="xs:date"/>
If you don’t specify minOccurs or maxOccurs, the default values of 1 are used, so in this case there has to be one and only one occurrence of Customer_dob
<xs:element name="Customer_order"
            type="xs:integer"
            minOccurs ="0"
            maxOccurs="unbounded"/>
Here, a customer can have any number of Customer_orders (even 0)
<xs:element name="Customer_hobbies"
            type="xs:string"
            minOccurs="2"
            maxOccurs="10"/>
In this example, the element Customer_hobbies must appear at least twice, but no more than 10 times

Simple Types

So far, you have touched on a few of the built-in data types xs:string, xs:integer, and xs:date. But, you also can define your own types by modifying existing ones. Examples of this would be:

  • Defining an ID; this may be an integer with a max limit.
  • A PostCode or Zip code could be restricted to ensure it is the correct length and complies with a regular expression.
  • A field may have a maximum length.

Creating you own types is coved more thoroughly in the next section.

Complex Types

A complex type is a container for other element definitions; this allows you to specify which child elements an element can contain. This allows you to provide some structure within your XML documents.

Have a look at these simple elements:

<xs:element name="Customer"         type="xs:string"/>
<xs:element name="Customer_dob"     type="xs:date"/>
<xs:element name="Customer_address" type="xs:string"/>

<xs:element name="Supplier"         type="xs:string"/>
<xs:element name="Supplier_phone"   type="xs:integer"/>
<xs:element name="Supplier_address" type="xs:string"/> 

You can see that some of these elements should really be represented as child elements, “Customer_dob” and “Customer_address” belong to a parent element, “Customer”. By the same token, “Supplier_phone” and “Supplier_address” belong to a parent element “Supplier”. You can therefore re-write this in a more structured way:

<xs:element name="Customer">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Dob"     type="xs:date" />
         <xs:element name="Address" type="xs:string" />
      </xs:sequence>
   </xs:complexType>
</xs:element>

<xs:element name="Supplier">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Phone"   type="xs:integer"/>
         <xs:element name="Address" type="xs:string"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>

Example XML

<Customer>
   <Dob> 2000-01-12T12:13:14Z </Dob>
   <Address>
      34 thingy street, someplace, sometown, w1w8uu
   </Address>
</Customer>

<Supplier>
   <Phone>0123987654</Phone>
   <Address>
      22 whatever place, someplace, sometown, ss1 6gy
   </Address>
</Supplier>

What’s Changed?

Look at this in detail.

  • You created a definition for an element called “Customer”.
  • Inside the <xs:element> definition, you added a <xs:complexType>. This is a container for other <xs:element> definitions, allowing you to build a simple hierarchy of elements in the resulting XML document.
  • Note that the contained elements for “Customer” and “Supplier” do not have a type specified because they do not extend or restrict an existing type; they are a new definition built from scratch.
  • The <xs:complexType> element contains another new element, <xs:sequence>, but more on these in a minute.
  • The <xs:sequence> in turn contains the definitions for the two child elements “Dob” and “Address”. Note the customer/supplier prefix has been removed because it is implied from its position within the parent element “Customer” or “Supplier”.

So, in English, this is saying you can have an XML document that contains a <Customer> element that must have teo child elements. <Dob> and <Address>.

Compositors

There are three types of compositors <xs:sequence>, <xs:choice>, and <xs:all>. These compositors allow you to determine how the child elements within them appear within the XML document.

Compositor Description
Sequence The child elements in the XML document MUST appear in the order they are declared in the XSD schema.
Choice Only one of the child elements described in the XSD schema can appear in the XML document.
All The child elements described in the XSD schema can appear in the XML document in any order.

Notes

The <xs:sequence> and <xs:choice> compositors can be nested inside other compositors, and be given their own minOccurs and maxOccurs properties. This allows for quite complex combinations to be formed.

One step further: The definition of “Customer->Address” and “Supplier->Address” are currently not very usable because they are grouped into a single field. In the real world, it would be better break this out into a few fields. You can fix this by breaking it out by using the same technique shown above:

<xs:element name="Customer">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Dob" type="xs:date" />
         <xs:element name="Address">
            <xs:complexType>
               <xs:sequence>
                  <xs:element name="Line1" type="xs:string" />
                  <xs:element name="Line2" type="xs:string" />
               </xs:sequence>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
</xs:element>

<xs:element name="Supplier">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Phone" type="xs:integer" />
         <xs:element name="Address">
            <xs:complexType>
               <xs:sequence>
                  <xs:element name="Line1" type="xs:string" />
                  <xs:element name="Line2" type="xs:string" />
               </xs:sequence>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
</xs:element>

This is much better, but you now have two definitions for address, which are the same.

Re-Use

It would make much more sense to have one definition of “Address” that could be used by both customer and supplier. You can do this by defining a complexType independently of an element, and giving it a unique name:

<xs:complexType name="AddressType">
   <xs:sequence>
      <xs:element name="Line1" type="xs:string"/>
      <xs:element name="Line2" type="xs:string"/>
   </xs:sequence>
</xs:complexType>

You have now defined a <xs:complexType> that describes your representation of an address, so use it. Remember when you started looking at elements and I said you could define your own type instead of using one of the standard ones (xs:string, xs:integer)? Well, that’s exactly what you are doing now.

<xs:element name="Customer">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Dob"     type="xs:date"/>
         <xs:element name="Address" type="AddressType"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>

<xs:element name="supplier">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="address" type="AddressType"/>
         <xs:element name="phone"   type="xs:integer"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>

The advantage should be obvious. Instead of having to define Address twice (once for Customer and once for Supplier), you have a single definition. This makes maintenance simpler ie if you decide to add “Line3” or “Postcode” elements to your address; you only have to add them in one place.

Example XML

<Customer>
   <Dob> 2000-01-12T12:13:14Z </Dob>
   <Address>
      <Line1>34 thingy street, someplace</Line1>
      <Line2>sometown, w1w8uu </Line2>
   </Address>
</Customer>

<Supplier>
   <Phone>0123987654</Phone>
   <Address>
      <Line1>22 whatever place, someplace</Line1>
      <Line2>sometown, ss1 6gy </Line2>
   </Address>
</Supplier>

Note: Only complex types defined globally (because children of the <xs:schema> element can have their own name and be re-used throughout the schema). If they are defined inline within an <xs:element>, they can not have a name (anonymous) and can not be re-used elsewhere.

Attributes

An attribute provides extra information within an element. Attributes are defined within an XSD as follows, having name and type properties.

<xs:attribute name="x" type="y"/>

An Attribute can appear 0 or 1 times within a given element in the XML document. Attributes are either optional or mandatory (by default, they are optional). The ” use” property in the XSD definition specifies whether the attribute is optional or mandatory.

So, the following are equivalent:

<xs:attribute name="ID" type="xs:string"/>
<xs:attribute name="ID" type="xs:string" use="optional"/>

The previous XSD definitions are shown graphically in Liquid XML Studio as follows:

To specify that an attribute must be present, use = “required” (note that use may also be set to “prohibited”, but you’ll come to that later).

An attribute is typically specified within the XSD definition for an element, this ties the attribute to the element. Attributes also can be specified globally and then referenced (but more about this later).

Sample XSD Sample XML
<xs:element name="Order">
   <xs:complexType>
      <xs:attribute name="OrderID"
                    type="xs:int"/>
   </xs:complexType>
</xs:element>
<Order OrderID="6"/>

or

<Order/>
<xs:element name="Order">
   xs:complexType>
      <xs:attribute name="OrderID"
                    type="xs:int"
                    use="optional"/>
   </xs:complexType>
</xs:element>
<Order OrderID="6"/>

or

<Order/>
<xs:element name="Order">
   <xs:complexType>
      <xs:attribute name="OrderID"
                    type="xs:int"
                    use="required"/>
   </xs:complexType>
</xs:element>
<Order OrderID="6"/>

The default and fixed attributes can be specified within the XSD attribute specification (in the same way as they are for elements).

Mixed Element Content

So far, you have seen how an element can contain data, other elements, or attributes. Elements also can contain a combination of all of these. You also can mix elements and data. You can specify this in the XSD schema by setting the mixed property.

<xs:element name="MarkedUpDesc">
   <xs:complexType mixed="true">
      <xs:sequence>
         <xs:element name="Bold"   type="xs:string" />
         <xs:element name="Italic" type="xs:string" />
      </xs:sequence>
   </xs:complexType>
</xs:element>

A sample XML document could look like this.

<MarkedUpDesc>
   This is an <Bold>Example</Bold> or
      <Italic>Mixed</Italic> Content,
   Note there are elements mixed in with the elements data.
</MarkedUpDesc>

History

Keep a running update of any changes or improvements you’ve made here.

More by Author

Must Read