SQL Server 2005 XQuery and XML-DML - Part 1

By Alex Homer

"An Overview of XML Support in SQL Server 2005"


Ever since the mid 90's, as XML has evolved to become the optimum way to share, transport and persist data, developers have sought efficient ways to store, manipulate and generally take advantage of its capabilities. Increasingly fast and easy-to-use XML parsers have been developed, transport protocols such as SOAP have been used to implement Web Services around XML, and many applications, tools and programming libraries now include features to import and export their data as XML.


However, the one area that has seen less development, particularly in the Microsoft world, is the implementation of efficient and robust technologies for persisting XML. XML is basically just text, and the most common persistence format is as a disk file in some standard text-based format such as ANSI, Unicode, etc. This is generally fine for single-user applications, but does prompt some serious questions when used in a server-based and/or multi-user environment.


This and the two subsequent articles look at how the latest version of Microsoft's enterprise-level database, SQL Server 2005, now offers great support for and close integration with XML as a data persistence format. This includes new ways to validate, store and query XML documents that are stored within the database. SQL Server 2005 provides native support for XML that can vastly improve application performance, while supporting robust and safe multi-user access to the data contained within the XML documents.

The topics we'll be covering in this article are:


  • A brief overview of the way that SQL Server 2005 stores XML documents and schemas
  • How SQL Server 2005 provides support for querying and manipulating XML documents
  • A simple test application that allows you to experiment with XQuery


In two subsequent articles, we'll see some techniques for improving the performance of applications that work with XML documents, as well as some examples of the different ways you can use XQuery:


  • Extracting data from xml columns, using parameter with XQuery and combining XQuery and XSL-T
  • Updating the contents of xml columns, and using XQuery in a managed code stored procedure

An Overview of XML Support in SQL Server 2005

SQL Server 2005 adds a raft of new features to support XML data storage and manipulation. These features make it easier to persist your XML documents within the database, while providing increased performance over the traditional techniques. We'll be comparing these traditional techniques, and seeing how you can improve the performance of your applications, in Parts 2 and 3. For the moment, however, we'll briefly explore the new features in SQL Server 2005. These include:


  • A dedicated data type named xml that can be used to store XML documents or fragments of XML
  • The ability to register XML schemas with SQL Server 2005, and store schema information within the database
  • Automatic validation of XML documents when a schema is present; and automatic shredding of the XML data to support efficient querying and updating of the content
  • An implementation of a subset of the W3C XQuery language and XML-DML to provide this querying and update facility
  • Support for hosting the .NET Common Language Runtime (CLR) within SQL Server, which allows stored procedures that manipulate XML documents to be written in managed code


You'll see how all these features come into play throughout these three articles, and how they open up new techniques for working with XML documents and XML data.

XML Schemas and the W3C Infoset Model

In recent years, it's become increasingly obvious that the major uses of XML are as a way of storing both rowset (single table) and hierarchical (multiple-table) data, rather than unstructured information such as newspaper articles. For example, a common use of Web Services in .NET applications is to expose data that represents a DataSet in a format that allows discovery and transmission across HTTP networks such as the Internet. The DataSet may contain a single table, or multiple tables that are related through primary and foreign keys, and the XML data can be used to completely reconstruct that DataSet on the client.


To specify the data type for an element or an attribute in an XML document you use a schema. This indicates, for example, whether a value such as "42"(which is stored as a text string within the XML) represents either a character string or a numeric value. The client can then reconstruct the data stored in the XML document so that it is accessible as the appropriate data types. This is at the heart of the recent moves towards the XML Information Set (Infoset) model, which effectively considers an XML document as one or more typed rowsets.



This means that you must expose an XML Schema (or the relevant schema information) for every XML document or fragment in order to take advantage of the Infoset model and data-typing of the XML content. SQL Server 2005 makes this easy by providing a schema repository that you can use to store XML schemas, and it will automatically use the appropriate schema to validate and store XML data.

The XML Schema Repository

Schemas are added to a database by executing the CREATEXMLSCHEMACOLLECTION statement, for example:

CREATE XML SCHEMA COLLECTION MyNewSchemaCol '<xsd:schema xmlns="...">

... schema content ...



You can add multiple schemas in one go by concatenating them together, use the ALTERXMLSCHEMACOLLECTION statement to add or remove individual schemas in a collection, and remove the collection using the DROPXMLSCHEMACOLLECTION statement. See the SQL Server help files for more details.


The name you assign to the collection ("MyNewSchemaCol" in the code above) is used in the ALTER and DROP statements, and is displayed in SQL Server Management Studio. However, you should include a targetNamespace attribute in the opening <schema> element to identify each schema in the collection, for example:


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"





Then you link your XML documents to the appropriate schema by specifying this namespace:


<?xml version="1.0" encoding="utf-8"?>

<root xmlns="http://myns/mydemoschema">




You can use SQL Server Management Studio to view and manage schemas and schema collections. For example, Figure 1 shows the schema collections in the AdventureWorks sample database that you can download and install in SQL Server 2005.


The schema collections in the AdventureWorks sample database
Figure 1 - The schema collections in the AdventureWorks sample database

The New xml Native Data Type

SQL Server 2005 supports a new native data type named xml that you use in exactly the same way as any other built-in data type. You can use it to define a column type for a table, as a parameter or variable in a stored procedure, and anywhere else you would use built-in types such as nvarchar, int, etc. The xml type can store either a complete XML document, or a fragment of XML, as long as it is well-formed (you cannot use an xml type to store XML that is not well-formed).

Typed and Un-typed xml Columns

When you provide a schema for the XML documents you will store, you can create a typedxml column in a table. You specify the name of the schema collection that contains the schema you want to apply to that column, for example:

CREATE TABLE MyTable(MyKey int, MyXml xml(MyNewSchemaCol))


Now the content of the XML document you insert into that column will be shredded automatically into its individual data items, and SQL Server will store these internally in the most efficient and compact way possible. When you query the column, SQL Server automatically reconstructs the XML document into its original form. Note, however, that this will not include things like comments that are not part of the original data content of the document. What you get back is effectively a serialized rowset that represents the data you originally stored there.


It's also possible to store your XML documents without specifying a schema, in which case you create an un-typedxml column. In this case, the XML is stored as a simple character string, because SQL Server has no way of knowing the data type of each element and attribute. This is less efficient, but does maintain the complete original content of the XML document (such as comments, etc.). But remember that, even with an un-typed column, the XML you insert must be well-formed.


To create an un-typed column, you simply omit the schema collection name when you create the table:


CREATE TABLE MyTable(MyKey int, MyXml xml)


Figure 2 shows SQL Server Management Studio displaying the structure of the Sales.Store table in the AdventureWorks sample database. You can see the xml-typed column named Demographics in the left-hand tree view, and a query that extracts the rows from this table in the right-hand query window. The results of running this query are shown in the grid below this, and we've super-imposed on this the view of the XML document you get when you click on the contents of one of the columns in the grid.


column in the Sales.Store table, showing one of the XML documents
Figure 2 - The xml column in the Sales.Store table, showing one of the XML documents it contains

Inserting and Selecting on an xml Column

Once you have created your table, you insert an XML document into an xml column in the same way as you would for any other built-in data type. You can insert it as a string value, or use the CASTor CONVERT functions to specifically convert it to an xml type:


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, 'xml-document-string')


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, CAST('xml-document-string') AS xml))


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, CONVERT(xml, 'xml-document-string'))


When extracting an XML document from an xml column you can convert it to a char, varchar, ncharor nvarchartype if you want to specify the character format:


SELECT CONVERT(nvarchar(500), MyXmlDoc, 'utf-8')

FROM MyTable WHERE ... etc. ...


Querying and Manipulating XML in SQL Server 2005

The previous section of this article has described the techniques for creating tables that contain an xml column, and how you can insert and extract XML documents from such a table. However, these assume that you actually want to access the complete document. As you'll see later, this often isn't the best way to work with XML - especially if you only need to access or update specific parts of the XML document.


SQL Server 2005 supports two new XML manipulation techniques that can provide better performance and reduce the application-level code you have to write. These new techniques are XQuery and the support for XML-DML in the xml data type, and we'll briefly look at both of them next.

XQuery Support in SQL Server 2005

For a long time, XML and relational data have been considered to be two different (yet complimentary) approaches to working with data. However, as you saw earlier, the XML Infoset model and the use of XML Schemas has narrowed that divide so that developers increasingly need to work with their data in both formats, and be able to switch between them easily.


The least intuitive aspect of this is that the developer needs to master the techniques for querying XML documents, and often this involves transforming them into the required format. Until quite recently, this meant either performing complex processing of the XML as a character string, or using an XSL-T style sheet. XSL-T is a very powerful technology, but is not particularly intuitive or easy to use.


In response to this, the W3C has committed to producing a recommendation for a new query language called XML Query, or XQuery as it is more usually known. XQuery combines a syntax that is familiar to developers used to working with relational data with the power of the XPath language that is used to select individual sections or collections of elements from an XML document.



XQuery allows you to select individual values from an XML document simply by specifying an XPath that identifies the elements or attributes you want, for example:




This code calls the built-in XQuery doc function to query a document named myxmlfile.xml, and return the name element that is a child of the product element where that product element contains an attribute named id that has the value "304". The result would be something like:


<name>Large Flange Compressor</name>


Alternatively, you can use an XQuery FLWOR statement to build up an XML document that contains values from the source document. FLWOR is an acronym that describes the sections of the XQuery statement you can use - the basic syntax is:


for [$variable] in [sequence-or-context]

let [$variable] := [sequence-or-context]

where [expression]

order by [expression]

return [literal-and-nodes]


For example, you can create an XML document that has a root element named myproductlist, and contains a list of product descriptions ordered by product name where the value of the id attribute is greater than 10, using this kind of syntax:




for $p in doc("products.xml")//product

let $d := $p/description

where data($p/@id) > "10"

order by $p/name[0]

return $d




The result would be of the form:



<description>Reverse flange compressor for big things</description>

<description>Extrapolating geek extractor for Windows XP</description>

<description>Cross-compiling actuarial extender with flaps</description>



SQL Server 2005 supports a subset of the XQuery language, so you can use these kinds of queries to extract data from XML documents stored in both typed and un-typed xml columns. However, note that SQL Server 2005 does not support the let statement - in almost all cases you can create an equivalent query without requiring this.


XQuery also implements a large selection of built-in functions, including the data function used in the example above (this function extracts the contents of an element - without it you get back the complete element including the opening and closing element tags). SQL Server 2005 implements the most common functions, and in general you will find that you can accomplish all the tasks you need with the subset that are available.


XML Methods and XML-DML Support in SQL Server 2005

The second new technology supported by SQL Server 2005 is the XML Data Manipulation Language (XML-DML). The new xml data type in SQL Server 2005 is, of course, an object in its own right (a class that implements the required behaviour and persistence for an XML document). And, like any other object, it exposes methods that you can use to manipulate the data it contains. These methods include:


  • The query method, which returns a fragment of un-typed XML
  • The value method, which returns a single value from the XML and exposes it as a standard (non-xml) SQL data type
  • The exist method, which can be used to test whether a specific node/value exists in the XML data
  • The modify method, which executes an XML Data Modification Language (XML-DML) statement
  • The nodes function, which returns a single-column rowset of nodes from the XML


The query, value and exist methods are useful if you just need to access an XML document to get specific values from it or check if it contains a specific value, without having to extract the whole document. The modify method is useful when you want to update the XML content by executing an XML-DML statement. Again, you can do this without having to extract the whole document.

Using the query Method

As an example, this is a SQL statement that queries the XML in an un-typed column named MyXml in a table named MyTable, and returns the matching name element:


SELECT MyXml.query('/root/product[@id="304"]/name') FROM MyTable


The result would be something like:


<name>Large Flange Compressor</name>


As you can see, there is no need to use the XQuery doc function. To get just the value of the matching element, you can use the data function:


SELECT MyXml.query('data(/root/product[@id="304"]/name)') FROM MyTable


In this case, the result would be just:


Large Flange Compressor


If the xml column you are querying against is a typed column (in other words, there is a schema registered for this column), you must specify the namespace for the schema in your query. The easiest way is to assign the namespace to a prefix, and use this prefix with each element in the query. For example:


SELECT MyXml.query('

declare namespace s="http://myns/mydemoschema";

/s:root/s:product[@s:id="304"]/s:name') FROM MyTable


You can also use the query method to execute a FLWOR XQuery. Based on the example we looked at earlier, we can build an XML document that contains a list of product names from a table containing a column named ProductXMLusing this query:


SELECT ProductXML.query('



for $p in //product

where data($p/@id) > 10

order by $p/name[1]

return $p/description



FROM ProductList WHERE ProductGroup = 3


This assumes that each row in the ProductList table is an XML document containing a list of products in a specific category (ProductGroup is the column containing the category number). It will returns an XML document such as:



<description>Reverse flange compressor for big things</description>

<description>Extrapolating geek extractor for Windows XP</description>

<description>Cross-compiling actuarial extender with flaps</description>


Using the value Method

The value method works much like the query method, except that it takes a second parameter that is the name of one of the SQL Server built-in data types. The value is returned as an instance of that type. The other point to note is that the XPath you specify must return a single node, so you should specify the node index (when using the query method, the XPath can select more than one node and will return an XML fragment or list of these matching nodes). For example:


SELECT MyXml.value('(/root/product[@id="304"]/name)[1]', 'nvarchar(30)')

FROM MyTable


Remember that you must declare the namespace of the document if the source column is a typed xml column.

Using the exist Method

The exist method takes an XPath expression that selects a single node within the XML document, and returns either True (bit value 1) if the node exists or False (bit value 0) if it does not. If the source column is a typed xml column (in which case you must declare the namespace in your query), and the element contains null, the method returns NULL instead. So the XQuery:


SELECT MyXml.exist('(/root/product[@id="304"])[1]' FROM MyTable


will return True if there is a product with the id value "304" (a product element with the attribute id="304"), or False if not. You can also use the exist method in the WHERE clause of a SQL statement:


SELECT column1, column2, column3 FROM MyTable

WHERE MyXml.exist('(/root/product[@id="304"])[1]') = 1

Using the modify Method

While the query, value and exist methods are extremely useful, the one that really puts the icing on the cake, as far as manipulating XML in SQL Server 2005 is concerned, is the modify method. This method allows you to change the contents of the XML document stored in an xml column without having to retrieve it, modify it, and then put is back afterwards. The general syntax of the modify method is:


xml-column.modify('insert-query' | 'delete-query' | 'replace-query')


As you can see, there are three types of "query" that you can use within the modify method. Each of these is, in fact, an XML-DML statement that is executed against the XML document in that column. This means that you can insert new nodes or fragments into an XML document, delete existing modes or fragments, or replace the value or content of existing nodes with new values or XML fragments.



As an example, this code deletes the productnode that has an attribute id="304" from the document:


UPDATE MyTable SET MyXml.modify('delete /root/product[@id="304"]')


This code deletes all the description elements from the document:


UPDATE MyTable SET MyXml.modify('delete /root/product/description')


You could also achieve the same result by using an XPath that just selects all the descendant description nodes directly:


UPDATE MyTable SET MyXml.modify('delete //description')

To insert a new node or fragment of XML, you use an insert statement. The syntax for this is:


xml-column.modify('insert new-content

{as first | as last} into | before | after



where new-content is the XML you want to insert, and xpath-expression is an XPath that selects a single existing node within the document. You separate these two expressions with a combination of keywords that indicate where in relation to the target node specified by xpath-expression you want to insert the new content. You can insert it as a child of the target node using the into keyword, in which case you must also specify whether it should be the first or the last node within the child nodes collection of the target element:


MyXml.modify('insert <newelement>New element content</newelement>

as first into (/root/product[@id="304"])[1]')


If you want to insert the new content as a sibling of the existing node (rather than into the child nodes collection of the target node), you use the before or after keyword instead. This code inserts the new XML fragment into the target document so that it comes before the product node that has the attribute id="304":


MyXml.modify('insert <product id="278">...new element content...</product>

before (/root/product[@id="304"])[1]')


Finally, you can use the replace method to update a value in an existing node. However, note that the replace method can only be used in a typedxml column. Using it with an un-typed xml column fails because the method requires that the data types of the new and existing values are of the same scalar types. The general syntax of this method is:


xml-column.modify('replace value of xpath-expression with new-value')


where xpath-expression specifies a single node within the document, and new-value is the value or XML fragment you want to replace it with. As an example, this code changes the value of the name child element within the product element that has the attribute id="304":


MyXml.modify('replace value of (/root/product[@id="304"]/name)[1]

with "New reverse flange extractor"')


Of course, as with all the other methods, you must specify the namespace when querying against a typedxml column. For example, the previous query - if executed against a typed column that uses a schema with targetNamespace="http://myns/mydemoschema" - would need to be changed to:


MyXml.modify('declare namespace s="http://myns/mydemoschema";

replace value of (/s:root/s:product[@s:id="304"]/s:name)[1]

with xs:string("New reverse flange extractor")')


Notice how the new content is cast to the correct type to match the element type specified in the schema. This assumes that the schema uses the namespace prefix "xs" to refer to the data types it defines. The cast to the type xs:string therefore makes sure that the new value matches the type defined in the schema. Alternatively, you can use the syntax:



with "New reverse flange extractor" cast as xs:string ?


Notice the question mark at the end of the statement. This is required because, if there is a cast error, XQuery rules insist that the statement returns an empty node sequence.

Using the nodes Method

The previous sections have looked at the query, value, exist and modify methods that are exposed by the new xml data type in SQL Server 2005. The final method that is available is the nodes method. This is useful when extracting data from an XML document, and can be used to create new tables based on the contents of the XML document.


For example, if you think of an XML document that describes a list of products as though it were a rowset, the nodes method can be used to create a new table where each product becomes a row in that table. This code creates such a table, which will then contain one column named ProductName. There will be a row for each product element in the original document, and the ProductName column will contain an XML document that has the product element as its root and includes all of the child nodes for the product element that were in the original XML document.


MyXml.nodes('/root/product/name') as NewTable(ProductName)


In other words, each row will contain an XML document such as:


<product id="304"><name>...</name><description>...</description></product>


The nodes method is commonly used in conjunctions with the CROSSAPPLY statement to create a rowset that is used as the input for another query. For example, this code creates a temporary rowset named NewTablecontaining a column named ProductDetails, which holds all the product elements, and then returns them using the XQuery '.' (the current node) in a call to the query method.


SELECT NewTable.NewColumn.query('.') AS ProductName

FROM ProductList CROSS APPLY ProductXML.nodes('//product') AS NewTable(ProductDetails)

WHERE ProductGroup = 3


This gives a result such as:


<product id="304"><name>...</name><description>...</description></product>

<product id="732"><name>...</name><description>...</description></product>

<product id="882"><name>...</name><description>...</description></product>

Passing Parameters to an XQuery

One issue that we haven't considered so far is how you can pass variables to an XQuery that are used within the XPath that selects parts of the XML content. All the previous examples have used fixed values in the queries, for example the XPath "/root/product[@id="304"]/name". In any real-world application, you'll need to pass variables into your data access code routines, rather than using a fixed value. There are, however, several ways you can achieve this. One way is to limit the rows that are returned by using a parameter as part of the WHERE clause, for example:


SELECT ProductXML.query('//description') FROM ProductList

WHERE ProductXML.value('(//product/@id)[1]', 'int') = @product_id


You just need to add a parameter named @product_id to the command you use to execute the query, and the resulting rowset will only include the description element from the product that has the matching value for its id attribute. However, if you need to reference a parameter value within the XQuery itself, this approach is not of any use. Instead, you can take advantage of two extensions to the XQuery syntax supported by SQL Server 2005 by binding to relational data within your XQuery - as described next.

Binding to Relational Data in an XQuery

SQL Server 2005 adds two new functions to XQuery that allow you to reference values normally only available in standard Transact-SQL (T-SQL) statement from within an XQuery query. These two extensions are:


  • The sql:variable function, which allows you to reference a variable or parameter that is declared within the SQL statement or stored procedure from within an XQuery
  • The sql:column function, which allows you to reference another (non-xml) column from within an XQuery


As an example of their use, we could write an XQuery that selects description elements for the product that has an id attribute value equal to a parameter named @product_id like this:


-- note that this will probably not produce the results you expect

SELECT ProductXML.query('/root/product[id=sql:variable("@product_id")]/description')

FROM ProductList


The value of the @product_id parameter is referenced using the sql:variable function, and evaluates to the value of the parameter when the XQuery is parsed and executed. However, as you can see from the comment in the text, this will probably not give you the results you expect. It will return a row for every row in the original table, with a NULL value for rows that don't match on the product ID.


Instead, you should use a WHERE clause to limit the rows that are returned, using the XML-DML exist method to match the product ID value to the value of the parameter. This code will return only the matching row from the table:


SELECT ProductXML.query('//description') FROM ProductList

WHERE ProductXML.exist('//product[./@id = sql:variable("@product_id")]') = 1


Notice that the sql:variable function is used in the exist method so that it will return True (bit value 1) only when there is a product element with an id attribute that has the same value as the parameter named @product_id.



The sql:column function can be used in the same way. Instead of providing the value of a parameter, it takes the value from the specified column in the current row in the table. For example, this code creates an XML fragment based on a root element named result, which contains the product ID as stored in the column named ProductRowID in this row of the table; plus the name and description extracted from the XML document in the ProductXML column:


SELECT ProductXML.query('


<product_id>{ sql:column("ProductRowID") }</product_id>

<product_name>{ data(//name) }</product_name>

<product_description>{ data(//description) }</product_description>



FROM ProductList WHERE ProductRowID = 25


Playing with XQuery in SQL Server

The syntax for XQuery can, as you've seen in this article, become somewhat complex. To allow you to experiment more freely with XQuery, we provide a simple "test" application that executes an XQuery against SQL Server 2005. Figure 3 shows the application in use, and you can see that there is a selection of queries provided that you can run and modify - or you can simply type in your own XQuery and execute it. The queries we provide demonstrate most of the functions and methods described in this article.


The XQuery Tester application
Figure 3 - The XQuery Tester application that is available to download and use to experiment with XQuery


Over the last few years, XML has moved from being a way to mark up documents so that they can be parsed by special applications, into a standard way to transport, persist and manipulate rowset data. The development of the XML Schema and XML Infoset standards means that XML is now a fully-fledged and widely used data format, and is increasing supported in modern applications, tools and programming libraries.


The one issue with XML has always been storage. Unlike relational data, where established and proven technologies exist to manage storage, querying and updating, XML has long been considered to be mainly a disk-based or stream-based format. But databases such as Oracle, and now SQL Server 2005, provide great ways of managing XML.


In SQL Server 2005, the topic of this article, there is a schema repository, a new XML column type, support for a subset of the XQuery language, and an implementation of XML-DML. In combination, these features allow developers to work with XML in similar ways to relational data.


This article, the first of three, has given you an overview of these new features in SQL Server 2005. In the following articles, we move on to look at how you can improve the performance of your applications, and provide a more robust environment for working with XML data, using the techniques and features of SQL Server 2005 described here.

This article was originally published on January 10th, 2008


Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date