The Fundamentals of the SQL Server 2005 XML Datatype

Have you ever wanted flexibility in your SQL Server database without having to add additional tables? Do you store XML in your SQL Server 2000 database and yearn for an easier way to validate it against a XML Schema definition? If you're planning on upgrading to SQL Server 2005, you're in luck.

The world is moving to XML for many of its data storage needs, and SQL Sever 2005 has embraced XML in a big way. Alhough SQL Server 2000 always supported some form of XML, SQL Server 2005 ups the ante quite a bit. Microsoft has added the new XML datatype, along with a range of functions to manipulate it.

The topic of XML in SQL Server 2005 is complex enough to warrant five whitepapers. So, if you're a relational database developer, you're probably wondering how to begin piecing together the XML landscape in SQL Server 2005. Using the AdventureWorks sample database, which ships with the CTP version of SQL Server 2005, this article provides the fundamentals for working with the XML datatype and refers you to other XML resources for additional study (see the Further Reading section at the end).

The first stop is a primer on some basic XML concepts.

XML Primer, XSD, and XML Namespaces

A complete introduction to XML is beyond the scope of this article. Besides, if you've modified any of the various standard configuration files that accompany .NET products, you're probably already familiar with XML's hierarchical, HTML-like look. The following is a typical XML document:

<ns0:root xmlns:ns0="http://schemas.microsoft.com/sqlserver/2004/
    07/adventure-works/ProductModelManuInstructionsModified">
  <ns0:Location LocationID="100" SetupHours="10.4"
       MachineHours="10.4" LaborHours="10.4" LotSize="10.4" />
</ns0:root>

XML namespaces and XML Schema Definition (XSD) are XML concepts with which you may be unfamiliar, however. A namespace is simply a set of related names that must be unique across a particular context, and it is used to identify things within that context. A good namespace analogy is the names of your family members. Everyone in your family has a unique name, eliminating the confusion of identifying a particular member. A common use for a namespace is to create a unique identifier for a class in an object-oriented programming language.

A XML namespace is a namespace for a particular XSD. To understand the importance of a namespace to a Schema definition, you must understand XSD and be familiar with some XML terminology. XML documents contain two essential pieces of information: elements and attributes. In the XML document example above, "Location" is an element and "LocationID" is an attribute. The basic difference is that elements can contain attributes and other elements, whereas attributes simply are pieces of additional information attached to an element.

In the SQL programming languages, a table definition describes the structure of data in a database. Likewise, XSD describes the structure of a particular XML document. The XSD for the XML document shown above appears below:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://schemas.microsoft.com/sqlserver/
                   2004/07/adventure-works/
                   ProductModelManuInstructionsModified"
            elementFormDefault="qualified">
  <xsd:element name="root">
    <xsd:complexType mixed="true">
      <xsd:complexContent mixed="true">
        <xsd:restriction base="xsd:anyType">
          <xsd:sequence>
            <xsd:element name="Location" maxOccurs="unbounded">
              <xsd:complexType mixed="true">
                <xsd:complexContent mixed="true">
                  <xsd:restriction base="xsd:anyType">
                    <xsd:attribute name="LocationID"
                                   type="xsd:integer"
                                   use="required" />
                    <xsd:attribute name="SetupHours"
                                   type="xsd:decimal" />
                    <xsd:attribute name="MachineHours"
                                   type="xsd:decimal" />
                    <xsd:attribute name="LaborHours"
                                   type="xsd:decimal" />
                    <xsd:attribute name="LotSize"
                                   type="xsd:decimal" />
                  </xsd:restriction>
                </xsd:complexContent>
              </xsd:complexType>
            </xsd:element>
          </xsd:sequence>
        </xsd:restriction>
      </xsd:complexContent>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

In the example above, "Location" is explicitly defined as an element and "LocationID" is defined as an attribute on the element Location. Like a table definition, elements and attributes in an XSD have type information.

Like records in a table, multiple XML documents modeled on the same XSD can exist in a database. As stated earlier, a XML namespace is a namespace for a particular XSD. So, including the XML namespace in the XML document you created ties the XML document to the XSD the document is modeled after. Using the XML namespace, a XML processor can quickly determine which Schema a document has utilized, retrieve the Schema based on the namespace, and validate the XML document using the appropriate Schema.

As you may have noticed, XML namespaces are typically formatted to look like a URL for a Web site. Like a class namespace, a XML namespace must be unique. To ensure that the namespace is unique, a portion of a URL is usually a registered domain name.

With this short introduction to namespaces and XML Schema Definition, you're ready for a discussion of the XML datatype in SQL Server 2005. Click here to download the accompanying source code for the examples.

Adding a New XML Datatype to a Table

Like other datatypes in SQL Server 2005, you can add a XML datatype programmatically by using Transact SQL (T-SQL) or SQL Server Management Studio. Figure 1 shows the configuration options for adding a XML datatype with SQL Server Management Studio.

Figure 1. Add a XML Column with SQL Server Management Studio

You must make three decisions when you add a XML datatype:

  1. Will your XML field be typed or untyped?
  2. Will you be storing complete XML documents or fragments of XML documents?
  3. Will you store XML documents associated with multiple Schema definitions?

Typed or Untyped XML?

Like other datatypes, the XML datatype must meet specific formatting criteria. It must conform to well-formed XML criteria (untyped) and you can optionally add additional conformance criteria by specifying a Schema collection (typed).

A Schema collection consists of a group of Schema definitions. Again, you have two options for adding Schemas to a collection, T-SQL or SQL Server Management Studio. Figure 2 shows the area you must access to maintain your Schema collection using SQL Server Management Studio.

Figure 2. View XML Schema in SQL Server Management Studio

Once you have built your Schema collection, you can assign the appropriate Schema collection to your XML datatype. Figure 3 shows a dropdown list of Schema collections in the AdventureWorks database.

Figure 3. Schema Dropdown List

Complete XML Documents or Fragments

Once you've decided on a typed or untyped XML datatype field, you must determine whether you will be storing a XML document based on a single Schema, multiple Schemas, or fragments of XML. Selecting yes for the "Is XML Document" option in the datatype configuration dialog sets the field to accept a single XML document associated with a single Schema. Setting "Is XML Document" to no configures the field for all other options, including fragments of XML data.

Below are some other facts to consider when you configure your XML datatype solution:

  • XML documents are limited to two gigabytes.
  • A XML document is UTF-16 encoded in the database.
  • XML documents can be indexed, though you must make considerations when you build the table with the contained XML datatype.
  • XML Schema Collection operations require special permissions to execute the appropriate T-SQL statements.

With knowledge of XML namespaces, Schema definitions, and XML datatype definition, you're ready to employ XQuery to manipulate the XML data in the database.

The Fundamentals of the SQL Server 2005 XML Datatype

Introduction to XQuery

SQL is the data-manipulation language for relational databases. XQuery is the SQL of the XML world. Like all XML specifications, the XQuery specification is maintained and enhanced by the World Wide Web Consortium (W3C). (All of the XQuery samples in this section utilize the Instructions field in the Production.ProductModel table of AdventureWorks.)

Mastering XQuery requires you to understand two things: XPath and FLOWR (For, Let, Order by, Where, Return) statements. XPath provides a way of expressing the location of data within a XML document and a way of performing operations on the data in an XML document. It includes functions for performing string, Boolean, and arithmetic operations on data within a XML document. XPath locations express a set of nodes within the XML document. An XPath expression looks like a file path in Windows Explorer. Keeping with the previous sample data, a typical XPath expression appears as follows:

/Inst:root/Inst:Location[1]/Inst:step

This XPath expression instructs the XML processor to give the calling program all of the "step" nodes attached to the first Location node, which is attached to the node called "root". (There are other ways to express the same set of Nodes above using XPath. All of the samples in this article follow the examples provided in the SQL Server 2005 Books Online.)

The expression above uses the XQuery qualified name (QName) with a namespace prefix and the more common short-cut syntax. The "[]" symbolizes a predicate. Predicates act like a filter, and they can contain a variety of other expressions. You'll see more predicate examples in a bit. (A complete introduction to all XPath expression syntax is beyond the scope of this article. See the Further Reading section at the end for more information.)

As stated previously, the second key to understanding XQuery is proficiency with the FLOWR statement. If you noticed earlier that the directives Where and Order By are also SQL directives and therefore assumed the statements worked like the SQL directives, you are correct. SQL was one of the inspirations for the FLOWR statement. In fact, the FLOWR statement is mechanically similar to a SQL select statement. Instead of returning rows of a table, the statement returns a set of XML nodes. The following is a typical FLOWR statement:

for $RetVal in /root/Location[1]/step[1]/tool
order by $RetVal descending
return $RetVal

This statement returns the tool nodes on the first "step" node of the first "Location" node in descending order. As you can see, the FLOWR statement utilizes XPath to identify the group of nodes on which to operate. The "For" part of the FLOWR statement works something like the "From" clause in SQL. Anything proceeded by a "$" denotes a variable in XPath.

XQuery includes many other operators and functions, including:

  • Value comparison operations
  • Aggregate functions such as Avg(), Sum(), and Count()
  • Conversion functions
  • If.. then.. else constructs

Now that you understand the basics of XQuery, look at how functions on the XML datatype use XQuery.

Functions on the XML Datatype

The following functions allow you to extract and modify XML data using stored procedures, triggers, and user-defined functions:

  • Query() allows you to get sets of nodes from a XML document.
  • Value() allows you to return a single value from an element or attribute in a document.
  • Exist() returns a boolean value true if the XQuery expression returns values and false if the XQuery expression returns nothing.
  • Modify() allows the developer to change values in a XML document.

First, look at the Query function. The data returned from a Query function are a set of nodes. The following is an example of the Query function:

SELECT Instructions.query('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
        /Inst:root/Inst:Location[1]/Inst:step
') as ResVal FROM Production.ProductModel

One common trait among all of the XML datatype functions is they are used within the context of a regular SQL statement. Another common trait is that the functions are part of the XML datatype rather than a separate SQL function.

As stated previously, the Value() function returns a single value from the XML document. The following are examples of the Value() function:

SELECT Instructions.value('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
        (/Inst:root/Inst:Location[1]/Inst:step[1]/Inst:tool)[1]
','nvarchar(50)') as ResVal FROM Production.ProductModel

SELECT Instructions.value('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
        (/Inst:root/Inst:Location[1]/@LocationID)[1]
','int') as ResVal FROM Production.ProductModel

The first statement returns the value of an element in the document. The second statement returns the value of an attribute in the document (the @ symbol in front of the LocationID denotes an attribute). The Value() function requires two parameters to which the XQuery expression and the SQL datatype cast the resulting value.

The Modify() function allows a developer to change data in the XML document. XQuery provides no mechanism for changing XML data, so Microsoft implemented the XML Data Modification Language (DML), which has three types of statements:

  • Insert adds new nodes in the XML document.
  • Delete removes nodes from the XML document.
  • Replace value of updates the value of a node.

Below are examples of each DML statement:

Insert

UPDATE Production.ProductModel
SET Instructions.modify('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
    insert element Inst:tool { "NewOne" } as last into
        (/Inst:root/Inst:Location[1]/Inst:step)[1]
')

Delete

UPDATE Production.ProductModel
SET Instructions.modify('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
    delete
        (/Inst:root/Inst:Location[1]/Inst:step[1]/Inst:tool)[.="NewOne"]
')

Update

UPDATE Production.ProductModel
SET Instructions.modify('
  declare namespace Inst="http://schemas.microsoft.com/sqlserver/
    2004/07/adventure-works/ProductModelManuInstructions";
    replace value of
        (/Inst:root/Inst:Location[1]/Inst:step[1]/Inst:tool[.="NewOne"])[1]
with "Old One"
')

If you are familiar with SQL, DML statements are intuitive. To use the statements properly, you must utilize some more advanced features of XQuery predicates. In the examples above, the [.="NewOne"] predicate works a lot like a where clause, narrowing the data returned by the XPath expression to nodes with specific information.

Like many other SQL datatypes, an XML datatype can be used with variables and as parameters to stored procedures. The following is an example of a stored procedure with an XML parameter:

create procedure Production.uspAddProductModel
@ModelName nvarchar(50),
@Inst xml

as
INSERT INTO [AdventureWorks].[Production].[ProductModel]
           ([Name]
           --,[CatalogDescription]
           ,[Instructions]
           ,[rowguid]
           ,[ModifiedDate])
     VALUES
           (@ModelName
           --,<CatalogDescription, ProductDescriptionSchemaCollection,>
           ,@Inst
           ,NEWID()
           ,GETDATE())

Previously, this article discussed Schema collections. If a XML datatype in your table is mapped to a Schema collection, any XML document you save to the datatype must match an XML Schema in the collection. The following error will appear for any XML that violates the Schema collection:

Msg 6905, Level 16, State 3, Line 1
XML Validation: Attribute 'LocationI' is not permitted in this context.

Finally, you can also include SQL datatypes from the SQL statement by executing the XML function inside the XQuery expression using a function called sql::column().

Other Considerations and the Future

You must address other considerations when using XML datatypes with SQL Server 2005:

  • XML views can be created from relational tables by using SQLXML mapping features. You can use XQuery against the XML views. Other XML functions facilitate adding the XML to underlying relational tables, so you need not worry about creating all of the necessary relational tables.
  • You can store XML in Text, Image, or nvarchar fields. If you store malformed XML or want to store unmodified XML, the other field may be your only option.
  • Versioning XML data is different from versioning relational data. You must consider the XSD implications, as well as accommodating existing data in the database, use of user-defined functions, XQuery changes, and so forth. You now can store in a single field what you would normally store in a set of tables.

Future native file formats of Microsoft Office applications will be XML. It will be interesting to see how SQL Server 2005 integrates with future Office versions.

A Good Start

If you've yearned for better XML support in SQL Server, you will be happy with the new SQL Server XML datatype. A wealth of information on the SQL Server 2005 XML datatype makes it difficult to know where to start. This article provides you with a foundation for further investigation.

Further Information

Volumes of information about XML technologies in Microsoft products are available. All of the following Microsoft Library articles were resources for this article and provide more information on XML in SQL Server 2005:

An article on developer.com, titled "Work with XML Data Type in SQL Server 2005 from ADO.NET 2.0," describes how to interact with the XML datatype from the client side.

The W3C Web site is the home for all things XML and contains many XML-related articles and specifications.

In addition, some useful articles in MSDN Magazine offer a general introduction to many Microsoft XML technologies. Each of the articles below also served as a resource for this article and will provide a more complete explanation of a XML technology:

  • "Addressing Infosets with XPath"
  • "Introduction to XPath 2.0"
  • "Understanding XML Namespaces"
  • "A Quick Guide to XML Schema"


About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server. Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security. Jeff is a Microsoft BizTalk MVP. Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: November 20, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT Are you wanting to target two or more platforms such as iOS, Android, and/or Windows? You are not alone. 90% of enterprises today are targeting two or more platforms. Attend this eSeminar to discover how mobile app developers can rely on one IDE to create applications across platforms and approaches (web, native, and/or hybrid), saving time, money, and effort and introducing apps to market faster. You'll learn the trade-offs for gaining long …

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds