XML Queries and Indexing in SQL Server 2005

SQL Server 2005 is the first version of Microsoft's flagship database to truly embrace XML. True, SQL Server 2000 has some bolt-on XML functionality, but in SQL Server 2005 XML is pervasive. In particular, you can use the new XML columns to actually store XML documents as part of a table. In this article, I'll dig into another part of the XML story: indexing those columns for faster searching.

Storing XML in a Table

XML columns allow you to store XML documents or fragments in a SQL Server database (an XML fragment is an XML document without the top-level xml element. When you add an XML column to a table, each row of the table stores an individual XML document or fragment. The individual documents cannot exceed 2 Gb each. So, for example, you might create a table of document information in a publishing-related database that contained the actual text of documents in XML format as a single column.

To create an XML column, you assign the xml data type to the column, whether you're using the CREATE TABLE statement or creating the table with SQL Server Management Studio. Here's an example:

CREATE TABLE Document (
   DocID int IDENTITY NOT NULL,
   Description nvarchar(50) NOT NULL,
   DocumentStore xml NOT NULL,
   CONSTRAINT PK_Document PRIMARY KEY CLUSTERED
   (
      DocID ASC
   )
)

This creates a table with three columns, the third of which can store XML documents. To insert data into an XML columns, you can use string constants. These will be implicitly converted to XML when you perform the insert. For example:

INSERT INTO Document (Description, DocumentStore)
VALUES('Bruce''s poem',
N'<?xml version="1.0" ?>
<Document Name="Poem">
<Author>Bruce</Author>
<Text>The cat/is flat.</Text>
</Document>')

INSERT INTO Document (Description, DocumentStore)
VALUES('Code of Hammurabi',
N'<?xml version="1.0" ?>
<Document Name="Code">
<Author>Hammurabi</Author>
<Text>An eye for an eye, a tooth for a tooth.</Text>
</Document>')

INSERT INTO Document (Description, DocumentStore)
VALUES('Nursery Rhyme',
N'<?xml version="1.0" ?>
<Document Name="Jack and Jill">
<Author>Mother Hubbard</Author>
<Text>Jack and Jill/went up the hill.</Text>
</Document>')

Writing XML Queries

There are two methods to extract data from XML columns: the .value method and the .query method. When you want to extract a single piece of information from an XML column and have it returned as a SQL data type (such as an int or a varchar), use the .value method. Its syntax is as follows:

value(XQuery, SQLType)

Here, XQuery is an XQuery expression, and SQLType is the name of the target SQL data type. For example:

SELECT DocumentStore.value('(/Document/@Name)[1]',
                           'varchar(50)' )
FROM Document

This gives the results shown in Figure 1.

As an alternative to the .value method, you can use the .query method. The .query method is used to run an XQuery expression against an XML column, returning the result as a value of the xml data type. Its syntax is simple:

query('XQuery')

For example:

SELECT DocumentStore.query('(/Document/Text)')
FROM Document

This gives the results shown in Figure 2.

Primary XML Indexes

XML columns are stored as binary large objects (BLOBs) in your database. Because of this, queries that search within an XML column can be slow. You can speed up these queries by creating a primary XML index on each XML column. A primary XML index is a special index that "shreds" the XML data to store information including:

  • Each tag name in the XML
  • The path from the root of the document to the tag
  • The value of the node
  • The type of the node
  • The corresponding primary key of the base table

To create a primary XML index, execute the CREATE PRIMARY XML INDEX statement:

CREATE PRIMARY XML INDEX index_name
   ON table_name (xml_column_name)

For the table I've been working with, this would be:

CREATE PRIMARY XML INDEX PXML_Document
   ON Document (DocumentStore)

A primary XML index will speed up most queries against an XML column, and it's a good idea to create one whenever you'll be searching XML columns that contain large amounts of data.

Secondary XML Indexes

In addition to the primary XML index, each XML column can have up to three secondary XML indexes. These are specialized XML indexes that help with particular types of XML queries, and can only be created on columns that already have a primary XML index:

  • The PATH secondary XML index helps with queries that use XML path expressions.
  • The VALUE secondary XML index helps with queries that search for values anywhere in the XML document.
  • The PROPERTY secondary XML index helps with queries that retrieve particular object properties from within an XML document.

To create a secondary XML index, you must use the CREATE XML INDEX statement:

CREATE XML INDEX index_name
   ON table_name (xml_column_name)
   [USING XML INDEX xml_index_name
      [FOR {VALUE|PATH|PROPERTY}]

Maintaining XML indexes is costly, especially if the data in the XML column changes frequently. Avoid creating secondary XML indexes unless you frequently perform the types of queries that they are specifically designed to help with.

Use Your Power Wisely

A final word of caution: Just because SQL Server 2005 can store XML columns doesn't mean that you have to put XML columns in every database. In many cases, it will make more sense to break down the data that might go into XML columns and make regular relational columns from it instead. But, in cases where SQL Server needs to interoperate with some other component that already depends on XML, storing data in XML columns can make a lot of sense. That way, you can avoid doing constant data conversions, and use a pure XML format throughout the business process. In those cases, think carefully about your querying requirements, create indexes as necessary, and test performance carefully. You'll probably find that XML columns make your life considerably easier.

About the Author

Mike Gunderloy is the author of over 20 books and numerous articles on development topics, and the Senior Technology Partner for Adaptive Strategy, a Washington State consulting firm. When he's not writing code, Mike putters in the garden on his farm in eastern Washington state.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: November 20, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT Are you wanting to target two or more platforms such as iOS, Android, and/or Windows? You are not alone. 90% of enterprises today are targeting two or more platforms. Attend this eSeminar to discover how mobile app developers can rely on one IDE to create applications across platforms and approaches (web, native, and/or hybrid), saving time, money, and effort and introducing apps to market faster. You'll learn the trade-offs for gaining long …

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds