SQL Server 2005 XQuery and XML-DML – Part 1

By Alex Homer

"An Overview of XML Support in SQL Server 2005"


Ever since the mid 90’s, as XML has evolved to become the
optimum way to share, transport and persist data, developers have sought efficient
ways to store, manipulate and generally take advantage of its capabilities.
Increasingly fast and easy-to-use XML parsers have been developed, transport
protocols such as SOAP have been used to implement Web Services around XML, and
many applications, tools and programming libraries now include features to
import and export their data as XML.


However, the one area that has seen less development,
particularly in the Microsoft world, is the implementation of efficient and
robust technologies for persisting XML. XML is basically just text, and the
most common persistence format is as a disk file in some standard text-based
format such as ANSI, Unicode, etc. This is generally fine for single-user
applications, but does prompt some serious questions when used in a
server-based and/or multi-user environment.


This and the two subsequent articles look at how the latest
version of Microsoft’s enterprise-level database, SQL Server 2005, now
offers great support for and close integration with XML as a data persistence
format. This includes new ways to validate, store and query XML documents that
are stored within the database. SQL Server 2005 provides native support for XML
that can vastly improve application performance, while supporting robust and
safe multi-user access to the data contained within the XML documents.

The topics we’ll be covering in this article are:


  • A brief overview of the way that SQL Server 2005 stores
    XML documents and schemas
  • How SQL Server 2005 provides support for querying and
    manipulating XML documents
  • A simple test application that allows you to experiment
    with XQuery


In two subsequent articles, we’ll see some techniques for
improving the performance of applications that work with XML documents, as well
as some examples of the different ways you can use XQuery:


  • Extracting data from xml columns, using parameter with
    XQuery and combining XQuery and XSL-T
  • Updating the contents of xml columns, and using XQuery in a
    managed code stored procedure

An Overview of XML Support in SQL Server 2005

SQL Server 2005 adds a raft of new features
to support XML data storage and manipulation. These features make it easier to
persist your XML documents within the database, while providing increased
performance over the traditional techniques. We’ll be comparing these
traditional techniques, and seeing how you can improve the performance of your
applications, in Parts 2 and 3. For the moment, however, we’ll briefly explore
the new features in SQL Server 2005. These include:


  • A dedicated data type named xml that can be used to store XML
    documents or fragments of XML
  • The ability to register XML schemas with SQL Server 2005,
    and store schema information within the database
  • Automatic validation of XML documents when a schema is
    present; and automatic shredding of the XML data to support efficient
    querying and updating of the content
  • An implementation of a subset of the W3C XQuery language and
    XML-DML to provide this querying and update facility
  • Support for hosting the .NET Common Language Runtime (CLR)
    within SQL Server, which allows stored procedures that manipulate XML
    documents to be written in managed code


You’ll see how all these features come into play throughout
these three articles, and how they open up new techniques for working with XML
documents and XML data.

XML Schemas and the W3C Infoset Model

In recent years, it’s become increasingly
obvious that the major uses of XML are as a way of storing both rowset (single table) and hierarchical (multiple-table) data, rather than
unstructured information such as newspaper articles. For example, a common use
of Web Services in .NET applications is to expose data that represents a
DataSet in a format that allows discovery and transmission across HTTP
networks such as the Internet. The
DataSet may contain a
single table, or multiple tables that are related through primary and foreign
keys, and the XML data can be used to completely reconstruct that
DataSet on the client.


To specify the data type for an element or
an attribute in an XML document you use a schema. This indicates, for example, whether
a value such as
"42"(which is stored as a text string within the XML) represents either
a character string or a numeric value. The client can then reconstruct the data
stored in the XML document so that it is accessible as the appropriate data
types. This is at the heart of the recent moves towards the XML Information
(Infoset) model,
which effectively considers an XML document as one or more typed rowsets.



This means that you must expose an XML
Schema (or the relevant schema information) for every XML document or fragment in
order to take advantage of the Infoset model and data-typing of the XML content.
SQL Server 2005 makes this easy by providing a schema repository that
you can use to store XML schemas, and it will automatically use the appropriate
schema to validate and store XML data.

The XML Schema Repository

Schemas are added to a database by
executing the

‘<xsd:schema xmlns="…">

… schema content …



You can add multiple schemas in one go by
concatenating them together, use the
ALTERXMLSCHEMACOLLECTION statement to add or remove
individual schemas in a collection, and remove the collection using the DROPXMLSCHEMACOLLECTION statement. See the SQL Server help files for more details.


The name you assign to the collection
MyNewSchemaCol" in the code above) is used in the ALTER and DROP statements, and is displayed in SQL Server Management Studio.
However, you should include a
targetNamespace attribute in the opening <schema> element
to identify each schema in the collection, for example:


<xs:schema xmlns_xs="http://www.w3.org/2001/XMLSchema"




Then you link your XML documents to the
appropriate schema by specifying this namespace:


<?xml version="1.0"


<root xmlns="http://myns/mydemoschema">



You can use SQL Server Management Studio to
view and manage schemas and schema collections. For example, Figure 1 shows the
schema collections in the
AdventureWorks sample database that you
can download and install in SQL Server 2005.


The schema collections in the AdventureWorks sample database

Figure 1 – The schema
collections in the AdventureWorks sample database

The New xml Native Data Type

SQL Server 2005 supports a new native data type named xml that you use
in exactly the same way as any other built-in data type. You can use it to
define a column type for a table, as a parameter or variable in a stored
procedure, and anywhere else you would use built-in types such as nvarchar, int, etc. The xml type can store
either a complete XML document, or a fragment of XML, as long as it is
well-formed (you cannot use an xml type to store XML that is not well-formed).

Typed and Un-typed xml Columns

When you provide a schema for the XML documents you will
store, you can create a typedxml column in a table. You specify the
name of the schema collection that contains the schema you want to apply to
that column, for example:

CREATE TABLE MyTable(MyKey int, MyXml xml(MyNewSchemaCol))


Now the content of the XML document you
insert into that column will be shredded automatically into its individual data
items, and SQL Server will store these internally in the most efficient and
compact way possible. When you query the column, SQL Server automatically
reconstructs the XML document into its original form. Note, however, that this
will not include things like comments that are not part of the original data
content of the document. What you get back is effectively a serialized rowset
that represents the data you originally stored there.


It’s also possible to store your XML
documents without specifying a schema, in which case you create an un-typed
xml column. In this case, the XML is stored as a simple character
string, because SQL Server has no way of knowing the data type of each element
and attribute. This is less efficient, but does maintain the complete original
content of the XML document (such as comments, etc.). But remember that, even
with an un-typed column, the XML you insert must be well-formed.


To create an un-typed column, you simply
omit the schema collection name when you create the table:


CREATE TABLE MyTable(MyKey int, MyXml xml)


Figure 2 shows SQL Server Management Studio displaying the
structure of the Sales.Store table in the AdventureWorks sample database. You can see the xml-typed column named Demographics in the left-hand tree view,
and a query that extracts the rows from this table in the right-hand query
window. The results of running this query are shown in the grid below this, and
we’ve super-imposed on this the view of the XML document you get when you click
on the contents of one of the columns in the grid.


column in the Sales.Store table, showing one of the XML documents

Figure 2 – The xml column in the Sales.Store table, showing one of the XML documents
it contains

Inserting and Selecting on an xml Column

Once you have created your table, you
insert an XML document into an
xml column in the same
way as you would for any other built-in data type. You can insert it as a
string value, or use the
CASTor CONVERT functions to
specifically convert it to an
xml type:


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, ‘xml-document-string’)


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, CAST(‘xml-document-string’) AS


INSERT INTO MyTable(MyKey, MyXml)

VALUES (1, CONVERT(xml, ‘xml-document-string’))


When extracting an XML document from an xml column you can convert it to a char, varchar, ncharor nvarchartype if you want to specify the character format:


SELECT CONVERT(nvarchar(500), MyXmlDoc, ‘utf-8’)

FROM MyTable WHERE … etc. …


Querying and Manipulating XML in SQL Server 2005

The previous section of this article has
described the techniques for creating tables that contain an
xml column, and how you can insert and extract XML documents from such
a table. However, these assume that you actually want to access the complete
document. As you’ll see later, this often isn’t the best way to work with XML –
especially if you only need to access or update specific parts of the XML


SQL Server 2005 supports two new XML
manipulation techniques that can provide better performance and reduce the
application-level code you have to write. These new techniques are XQuery and the support for XML-DML
in the xml data type, and we’ll
briefly look at both of them next.

XQuery Support in SQL Server 2005

For a long time, XML and relational data
have been considered to be two different (yet complimentary) approaches to
working with data. However, as you saw earlier, the XML Infoset model and the
use of XML Schemas has narrowed that divide so that developers increasingly
need to work with their data in both formats, and be able to switch between
them easily.


The least intuitive aspect of this is that
the developer needs to master the techniques for querying XML documents, and
often this involves transforming them into the required format. Until quite
recently, this meant either performing complex processing of the XML as a
character string, or using an XSL-T style sheet. XSL-T is a very powerful
technology, but is not particularly intuitive or easy to use.


In response to this, the W3C has committed
to producing a recommendation for a new query language called XML Query,
or XQuery as it is more usually known. XQuery combines a syntax that is
familiar to developers used to working with relational data with the power of
the XPath language that is used to select individual sections or
collections of elements from an XML document.



XQuery allows you to select individual
values from an XML document simply by specifying an XPath that identifies the
elements or attributes you want, for example:




This code calls the built-in XQuery doc function to query a document named myxmlfile.xml, and
return the name element that is a child of the product element where that product element contains an attribute
named id that has the value "304".
The result would be something like:


Flange Compressor</name>


Alternatively, you can use an XQuery FLWOR statement to
build up an XML document that contains values from the source document. FLWOR
is an acronym that describes the sections of the XQuery statement you can use –
the basic syntax is:


for [$variable] in [sequence-or-context]

let [$variable] := [sequence-or-context]

where [expression]

order by [expression]

return [literal-and-nodes]


For example, you can create an XML document
that has a root element named
myproductlist, and
contains a list of product descriptions ordered by product name where the value
of the
id attribute is greater than 10, using this kind of syntax:




for $p in doc("products.xml")//product

let $d := $p/description

where data($p/@id) > "10"

order by $p/name[0]

return $d




The result would be of the form:



flange compressor for big things</description>

geek extractor for Windows XP</description>

actuarial extender with flaps</description>



SQL Server 2005 supports a subset of the
XQuery language, so you can use these kinds of queries to extract data from XML
documents stored in both typed and un-typed
xml columns. However,
note that SQL Server 2005 does not support the
let statement – in
almost all cases you can create an equivalent query without requiring this.


XQuery also implements a large selection of
built-in functions, including the
data function used in
the example above (this function extracts the contents of an element – without
it you get back the complete element including the opening and closing element
tags). SQL Server 2005 implements the most common functions, and in general you
will find that you can accomplish all the tasks you need with the subset that
are available.


XML Methods and XML-DML Support in SQL Server 2005

The second new technology supported by SQL
Server 2005 is the XML Data Manipulation Language (XML-DML). The
xml data type in SQL Server 2005 is, of course, an object in its own
right (a class that implements the required behaviour and persistence for an
XML document). And, like any other object, it exposes methods that you can use
to manipulate the data it contains. These methods include:


  • The query method, which returns a fragment of un-typed XML
  • The value method, which returns a single value from the XML and exposes it as a
    standard (non-
    xml) SQL data type
  • The exist method, which can be used to test whether a specific node/value exists in
    the XML data
  • The modify method, which executes an XML Data Modification Language (XML-DML)
  • The nodes function, which returns a single-column rowset of nodes from the XML


The query, value and exist methods are useful if you just need to access an XML document to
get specific values from it or check if it contains a specific value, without
having to extract the whole document. The
modify method is useful
when you want to update the XML content by executing an XML-DML statement. Again,
you can do this without having to extract the whole document.

Using the query Method

As an example, this is a SQL statement that
queries the XML in an un-typed column named
MyXml in a table named MyTable, and returns the
name element:


SELECT MyXml.query(‘/root/product[@id="304"]/name’)
FROM MyTable


The result would be something like:


Flange Compressor</name>


As you can see, there is no need to use the
doc function. To get just the value of the matching element, you can
use the
data function:


SELECT MyXml.query(‘data(/root/product[@id="304"]/name)’)
FROM MyTable


In this case, the result would be just:


Flange Compressor


If the xml column you are
querying against is a typed column (in other words, there is a schema
registered for this column), you must specify the namespace for the schema in
your query. The easiest way is to assign the namespace to a prefix, and use
this prefix with each element in the query. For example:


SELECT MyXml.query(‘

declare namespace s="http://myns/mydemoschema";

FROM MyTable


You can also use the query method to execute a FLWOR XQuery. Based on the example we looked at
earlier, we can build an XML document that contains a list of product names from
a table containing a column named
ProductXMLusing this


SELECT ProductXML.query(‘



for $p in //product

where data($p/@id) > 10

order by $p/name[1]

return $p/description



FROM ProductList WHERE ProductGroup = 3


This assumes that each row in the ProductList table is an XML document containing a list of products in a
specific category (
ProductGroup is the column containing the category number). It will returns an
XML document such as:



<description>Reverse flange compressor for big things</description>

<description>Extrapolating geek extractor for Windows

<description>Cross-compiling actuarial extender with


Using the value Method

The value method works much
like the
query method, except that it takes a second parameter that is the name of
one of the SQL Server built-in data types. The value is returned as an instance
of that type. The other point to note is that the XPath you specify must return
a single node, so you should specify the node index (when using the
query method, the XPath can select more than one node and will return an
XML fragment or list of these matching nodes). For example:


SELECT MyXml.value(‘(/root/product[@id="304"]/name)[1]’,

FROM MyTable


Remember that you must declare the
namespace of the document if the source column is a typed
xml column.

Using the exist Method

The exist method takes an
XPath expression that selects a single node within the XML document, and
returns either
True (bit value 1) if the node exists or False (bit value 0) if it does not. If the source column
is a typed
xml column (in which case you must declare the namespace in your
query), and the element contains
null, the method returns NULL instead. So the XQuery:


SELECT MyXml.exist(‘(/root/product[@id="304"])[1]’
FROM MyTable


will return True if there is a
product with the
id value "304" (a product element with the attribute id="304"), or False if not. You can also use the exist method in the WHERE clause of a SQL statement:


SELECT column1, column2, column3 FROM MyTable

WHERE MyXml.exist(‘(/root/product[@id="304"])[1]’)
= 1

Using the modify Method

While the query, value and exist methods are extremely useful, the one that really puts the icing on
the cake, as far as manipulating XML in SQL Server 2005 is concerned, is the
modify method. This method allows you to change the contents of the XML
document stored in an
xml column without having to retrieve it,
modify it, and then put is back afterwards. The general syntax of the
modify method is:


| ‘delete-query‘ | ‘replace-query‘)


As you can see, there are three types of

"query" that you can use within the modify method. Each of
these is, in fact, an XML-DML statement that is executed against the XML
document in that column. This means that you can insert new nodes or fragments
into an XML document, delete existing modes or fragments, or replace the value or
content of existing nodes with new values or XML fragments.



As an example, this code deletes the productnode that has an attribute id="304" from
the document:


UPDATE MyTable SET MyXml.modify(‘delete /root/product[@id="304"]’)


This code deletes all the description elements from the document:


UPDATE MyTable SET MyXml.modify(‘delete /root/product/description’)


You could also achieve the same result by
using an XPath that just selects all the descendant
description nodes directly:


UPDATE MyTable SET MyXml.modify(‘delete //description’)

To insert a new node or fragment of XML,
you use an
insert statement. The syntax for this is:


xml-column.modify(‘insert new-content

{as first | as last}
into | before | after



where new-content is the XML you want to insert, and xpath-expression is
an XPath that selects a single existing node within the document. You separate
these two expressions with a combination of keywords that indicate where in
relation to the target node specified by
xpath-expression you want to insert
the new content. You can insert it as a child of the target node using
the into keyword, in which case you must also specify whether it should be the first or
the last node within the child nodes collection of the target element:


MyXml.modify(‘insert <newelement>New
element content</newelement>

as first into (/root/product[@id="304"])[1]’)


If you want to insert the new content as a sibling of the existing node (rather than into the child nodes collection of the target
node), you use the
before or after keyword instead. This code inserts the new XML fragment into the
target document so that it comes before the
product node that has
the attribute


MyXml.modify(‘insert <product
id="278">…new element content…</product>

before (/root/product[@id="304"])[1]’)


Finally, you can use the replace method to update a value in an existing node. However, note that
replace method can only be used in a typedxml column. Using it with an un-typed xml column fails because
the method requires that the data types of the new and existing values are of
the same scalar types. The general syntax of this method is:


value of xpath-expression with new-value‘)


where xpath-expression specifies a single
node within the document, and new-value is the value or XML fragment you want to
replace it with. As an example, this code changes the value of the name child element
within the product element that has the attribute id="304":


MyXml.modify(‘replace value of (/root/product[@id="304"]/name)[1]

with "New reverse flange


Of course, as with all the other methods,
you must specify the namespace when querying against a typed
xml column. For example, the previous query – if executed against a
typed column that uses a schema with
targetNamespace="http://myns/mydemoschema" – would need to be changed to:


MyXml.modify(‘declare namespace

replace value of (/s:root/s:product[@s:id="304"]/s:name)[1]

with xs:string("New
reverse flange extractor")’)


Notice how the new content is cast to the
correct type to match the element type specified in the schema. This assumes
that the schema uses the namespace prefix "
xs" to refer to the
data types it defines. The cast to the type
xs:string therefore
makes sure that the new value matches the type defined in the schema. Alternatively,
you can use the syntax:


with "New reverse flange
extractor" cast as xs:string ?


Notice the question mark at the end of the
statement. This is required because, if there is a cast error, XQuery rules
insist that the statement returns an empty node sequence.

Using the nodes Method

The previous sections have looked at the query, value, exist and modify methods that are exposed by the new xml data type in SQL
Server 2005. The final method that is available is the
nodes method. This is useful when extracting data from an XML document,
and can be used to create new tables based on the contents of the XML document.


For example, if you think of an XML
document that describes a list of products as though it were a rowset, the
nodes method can be used to create a new table where each product becomes
a row in that table. This code creates such a table, which will then contain
one column named
ProductName. There will be a row for each product element in the original
document, and the
ProductName column will contain an XML document that has the product element as its root and includes all of the child nodes for the
product element that were in the original XML document.


MyXml.nodes(‘/root/product/name’) as NewTable(ProductName)


In other words, each row will contain an
XML document such as:




The nodes method is commonly
used in conjunctions with the
CROSSAPPLY statement to
create a rowset that is used as the input for another query. For example, this
code creates a temporary rowset named
NewTablecontaining a
column named
ProductDetails, which holds all the product elements, and then returns them using
the XQuery
‘.’ (the current node) in a call to the query method.


SELECT NewTable.NewColumn.query(‘.’) AS ProductName

FROM ProductList CROSS APPLY ProductXML.nodes(‘//product’)
AS NewTable(

WHERE ProductGroup = 3


This gives a result such as:





Passing Parameters to an XQuery

One issue that we haven’t considered so far
is how you can pass variables to an XQuery that are used within the XPath that
selects parts of the XML content. All the previous examples have used fixed
values in the queries, for example the XPath
In any real-world application, you’ll need to pass variables into your data
access code routines, rather than using a fixed value. There are, however,
several ways you can achieve this. One way is to limit the rows that are
returned by using a parameter as part of the WHERE clause, for example:


SELECT ProductXML.query(‘//description’) FROM

WHERE ProductXML.value(‘(//product/@id)[1]’,
‘int’) = @product_id


You just need to add a parameter named @product_id to the command you use to execute the query, and the resulting
rowset will only include the
description element from
the product that has the matching value for its
id attribute. However,
if you need to reference a parameter value within the XQuery itself, this
approach is not of any use. Instead, you can take advantage of two extensions
to the XQuery syntax supported by SQL Server 2005 by binding to relational data
within your XQuery – as described next.

Binding to Relational Data in an XQuery

SQL Server 2005 adds two new functions to
XQuery that allow you to reference values normally only available in standard
Transact-SQL (T-SQL) statement from within an XQuery query. These two
extensions are:


  • The sql:variable function, which allows you to reference a variable or parameter that is
    declared within the SQL statement or stored procedure from within an
  • The sql:column function, which allows you to reference another (non-xml) column from within an XQuery


As an example of their use, we could write
an XQuery that selects
description elements for the product that
has an
id attribute value equal to a parameter named @product_id like


— note that this will probably not produce
the results you expect

SELECT ProductXML.query(‘/root/product[id=sql:variable("@product_id")]/description’)

FROM ProductList


The value of the @product_id parameter is referenced using the sql:variable function, and evaluates to
the value of the parameter when the XQuery is parsed and executed. However, as you can see from the comment in the text, this will
probably not give you the results you expect. It will return a row for
every row in the original table, with a
NULL value for rows that
don’t match on the product ID.


Instead, you should use a WHERE clause to limit the rows that are returned, using the XML-DML exist method to match the product ID value to the value of the parameter.
This code will return only the matching row from the table:


SELECT ProductXML.query(‘//description’) FROM

WHERE ProductXML.exist(‘//product[./@id = sql:variable("@product_id")]’)
= 1


Notice that the sql:variable function is used in the exist method so that it
will return
True (bit value 1) only when there is a product element with an id attribute that has the same value as the parameter named @product_id.



The sql:column function can be used in the same way. Instead of providing the value of a
parameter, it takes the value from the specified column in the current row in
the table. For example, this code creates an XML fragment based on a root
element named result,
which contains the product ID as stored in the column named ProductRowID in
this row of the table; plus the name and description extracted from the XML
document in the ProductXML column:


SELECT ProductXML.query(‘


<product_id>{ sql:column("ProductRowID")

<product_name>{ data(//name) }</product_name>

<product_description>{ data(//description)



FROM ProductList WHERE ProductRowID = 25


Playing with XQuery in SQL Server

The syntax for XQuery can, as you’ve seen
in this article, become somewhat complex. To allow you to experiment more
freely with XQuery, we provide a simple "test" application that
executes an XQuery against SQL Server 2005. Figure 3 shows the application in
use, and you can see that there is a selection of queries provided that you can
run and modify – or you can simply type in your own XQuery and execute it. The
queries we provide demonstrate most of the functions and methods described in
this article.


The XQuery Tester application

Figure 3 – The
XQuery Tester application that is available to download and use to experiment
with XQuery


Over the last few years, XML has moved from being a way to
mark up documents so that they can be parsed by special applications, into a
standard way to transport, persist and manipulate rowset data. The development
of the XML Schema and XML Infoset standards means that XML is now a
fully-fledged and widely used data format, and is increasing supported in
modern applications, tools and programming libraries.


The one issue with XML has always been storage. Unlike
relational data, where established and proven technologies exist to manage
storage, querying and updating, XML has long been considered to be mainly a
disk-based or stream-based format. But databases such as Oracle, and now SQL
Server 2005, provide great ways of managing XML.


In SQL Server 2005, the topic of this article, there is a
schema repository, a new XML column type, support for a subset of the XQuery
language, and an implementation of XML-DML. In combination, these features
allow developers to work with XML in similar ways to relational data.


This article, the first of three, has given you an overview
of these new features in SQL Server 2005. In the following articles, we move on
to look at how you can improve the performance of your applications, and
provide a more robust environment for working with XML data, using the
techniques and features of SQL Server 2005 described here.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read