SQL Server 2005 Programmability Enhancements � Common Table Expressions

There is a plethora of new T-SQL features and enhancements in SQL Server 2005, and you could probably write a small book talking about all of them. However, this article will focus on one of them — specifically, Common Table Expressions (CTE), because personally they have come in handy on many occasions.

I have used a few of the other new features and enhancements, and if you have not had a chance to delve into any of them, I would highly recommend familiarizing yourself with some of them, such as the new xml data type, improved error handling, query and event notifications, new DDL (Data Definition Language) triggers, ranking functions, TOP operator enhancements, the PIVOT and UNPIVOT operators, and the new APPLY operator. There are a few others, and they all add tremendous value to T-SQL developers.

This article will introduce common table expressions and go as in-depth as space will allow, showing how CTEs can be used to improve T-SQL's capabilities and how they can help better your T-SQL.

Common Table Expressions

A common table expression is expressed as a temporary table or view defined within an executing statement, such as SELECT INSERT, UPDATE, or DELETE. Think of a CTE as somewhat like a derived table in that a CTE only lasts throughout the duration of the executing statement. When that execution of the statement finishes, the CTE is gone. However, there are a few differences between a derived table and a CTE:

A CTE can reference itself. A derived table cannot.

A CTE can be referenced multiple times in the same query.

So what is the difference between a CTE and a true temporary table? The quick and easy answer is the overhead of having to create a temporary table and the performance consequence of using a temporary table. Think about what happens when you use a true temporary table. First, you have to define and create the table then populate it, and then run queries against it. When using CTEs you can forgo the first step and the associated overhead and use the built-in CTE, which provides all of that functionality for you.

CTE Structure

The structure of CTE is fairly simple. It basically contains two main parts, the first part being the CTE followed by the second part, which is the normal execution statement. The basic syntax for a CTE is as follows:

WITH common_table_expression_name [column(s)]
AS
(
  query_definition
)
EXECUTION STATEMENT

As defined in the syntax above, a common table expression contains several arguments that are defined as follows:

common_table_expression_name: The identifier for the common table expression. This name must be different than any other table or view used within the CTE. It can, however, be named the same as a base table or view, but when the common table expression is referenced in the normal execution statement, the common table expression name must be used, rather than the base name.

column(s): The column names in the common table expression. These column names must be unique (meaning, no duplicate names allowed) and must match the same number of columns returned by the query_definition.

query_definition: The SELECT statement in which the results are used to populate the common table expression. This statement follows the same requirements that the normal execution statement follows.

A CTE expression cannot define another CTE expression.

The following example uses the syntax outlined above to create a very simple common table expression query. Open a query window in SQL Server Management Studio and execute the following query against the AdventureWorks database.

WITH CTE_ProdMod (ProductModelID, ProductModelCount) AS
(
	SELECT   ProductModelID, COUNT(*)
	FROM     Production.Product
	WHERE    ProductModelID IS NOT NULL
	GROUP BY ProductModelID
)
SELECT   ProductModelID, ProductModelCount
FROM     CTE_ProdMod
ORDER BY ProductModelID

The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.

ProductModelID ProductModelCount
-------------- -----------------
1              3
2              1
3              3
4              3
5              10
6              11
7              8
8              10
9              12
10             10

A simple example, yet not that impressive because it doesn't quite showcase the true power and flexibility of what a common table expression can really do. So, the following example uses a common table expression to return the count of products for each product model. The count is returned in the CTE, then the CTE is used to filter the results even further based on what was returned in the CTE.

WITH CTE_TranHist (ProductID, OrderQuantity) AS
(
	SELECT   ProductID, COUNT(*)
	FROM     Production.TransactionHistory
	WHERE    TransactionDate > '05/01/2004'
	GROUP BY ProductID
)
SELECT cte.ProductID, cte.OrderQuantity, 
	pp.productmodelid, ProductSubcategoryID
FROM Production.Product pp
INNER JOIN CTE_TranHist AS cte ON pp.productID = cte.productid
WHERE cte.productid BETWEEN 500 AND 750
AND ProductModelID IS NOT NULL
ORDER BY cte.ProductID

The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.

ProductID   OrderQuantity ProductModelID ProductSubcategoryID
----------- ------------- -------------- --------------------
680         21            6              14
706         15            6              14
707         567           33             31
708         526           33             31
711         578           33             31
712         567           2              19
713         118           11             21
714         150           11             21
715         138           11             21
716         99            11             21
717         10            6              14
718         12            6              14
722         57            9              14
725         1             9              14
726         1             9              14
729         1             9              14
730         2             9              14
736         54            9              14
737         45            9              14
738         71            9              14
739         63            5              12
742         67            5              12
743         78            5              12
746         47            5              12
747         62            5              12
748         80            5              12

Imagine now that syntax you would have had to use to get the same results prior to SQL Server 2005. It would have looked something like this:

SELECT TranHist.ProductID, TranHist.OrderQuantity, 
	pp.ProductModelID, pp.ProductSubcategoryID
FROM Production.Product pp
INNER JOIN 
	(SELECT ProductID, COUNT(*)
		FROM Production.TransactionHistory
		WHERE TransactionDate > '05/01/2004'
		GROUP BY ProductID) 
	AS TranHist (ProductID, OrderQuantity)
ON pp.productID = TranHist.productid
WHERE TranHist.productid BETWEEN 500 AND 750
AND ProductModelID IS NOT NULL

Now, which would you rather read, let alone write? I thought so. Common table expressions have a number of advantages, including better readability to make it much easier to maintain more complex queries.

Now that you have a general understanding of CTEs, the true value of CTEs appear when you have the need for recursive queries. This topic is discussed next.

SQL Server 2005 Programmability Enhancements � Common Table Expressions

Recursive Queries

If you're not excited about CTEs yet, then you will be shortly, because CTEs provide the ability to do something that SQL developers have been praying for a very long time: querying hierarchical data. Didn't you just cringe when your boss asked you to create a company employee org chart? At long last the prayers of SQL developers have been answered.

In straightforward terms, a recursive CTE is a CTE that references itself. It is executed over and over until no more results are returned. Just like a normal CTE, a recursive CTE can be used with a normal SELECT, UPDATE, INSERT, AND DELETE statement, and even a CREATE VIEW statement.

The general syntax for create a recursive CTE is as follows:

WITH common_table_expression_name [column(s)]
AS
(
  query_definition_anchor_member
  [UNION ALL | UNION | EXCEPT | INTERSECT]
  query_definition_recursive_member
)
EXECUTION STATEMENT

A recursive CTE, while similar to that of a normal of a normal CTE, contains three parts versus the one of a normal CTE. Those three parts are explained below.

common_table_expression_name: No different than that of a normal CTE, this is the identifier for the common table expression. This name must be different than any other table or view used within the CTE. It can, however, be named the same as a base table or view, but when the common table expression is referenced in the normal execution statement, the common table expression name must be used, not the base name.

query_definition_anchor_member: The anchor member is the query definition that forms the base result set. Anchor members must precede the recursive member definition joined together by a UNION ALL operator.

query_definition_recursive_member: The recursive query definition includes the query definitions joined to the anchor member by a UNION ALL operator.

A correctly coded recursive CTE must contain at least one anchor member query definition and one recursive member query definition.

A Simple Example

To understand the recursive CTE structure and they work, a simple sample is in order.

WITH CTE_Sample (Col1, Col2, Col3) AS
(
	SELECT   Column1, Column2, Column3
	FROM     Table1
	WHERE    Some_Where_Clause
 UNION ALL
	SELECT   Column1, Column2, Column3
	FROM     Table1
 INNER JOIN CTE_Sample ON Table1.Column1 = CTE_Sample.Column1
)
SELECT Col1, Col2, Col3
FROM CTE_Sample

Examining the example above you should see that a recursive CTE works as follows:

First, the anchor query definition returns a base result set. Next, the recursive query definition queries and returns information based on the anchor query definition, accomplished by joining the recursive query definition to the anchor query definition via a UNION ALL operator. The recursive query is executed again using the results returned from the first iteration as the base query (or input value to the next iteration of the recursive query). The recursive query is executed again and again until it finds no more results, at which point the entire result set is returned to the execution statement.

As a more detailed example, consider the following:

WITH CTE_Sample (MngrID, EmpID) AS
(
	SELECT   ManagerID, EmployeeID
	FROM     HumanResources.Employee
	WHERE    ManagerID IS NULL
 UNION ALL
	SELECT   ManagerID, EmployeeID
	FROM     HumanResources.Employee
 INNER JOIN CTE_Sample ON Employee.ManagerID = CTE_Sample.EmpID
)
SELECT MngrID, EmpID
FROM CTE_Sample

This example, a scaled down version of the example found in the SQL Server 2005 BOL, follows the steps outlined in the previous paragraph. The anchor query returns the base result set, in this case, the big cheese since he has no ManagerID. The recursive query is then executed, joining on the anchor query to return all employees who has a ManagerID of the person found in the anchor query (in this case, the big cheese). The recursive query is then executed again using those employees recently found in the first iteration of the recursive query, using those as the base query or input value to the current recursive query.

As stated above, a recursive CTE is executed over and over until no more results are returned. It does this by providing an internal, implicit termination check. With each iteration of the CTE, it basically asks itself, "Hey, have we retrieved all the rows?" And if the results of that check are true, then it says, "OK, we're done!" and exits the recursion.

During the 1960s, there was a popular TV show called Lost in Space in which an unnamed robot and his human counterpart, Will Robinson, traveled through space tackling one adventure after another. Part of the responsibility of the robot was to "watch the back" of Will and warn him of any potentially dangerous situations. When a situation did arise that could be or indeed was dangerous, the robot would utter the phrase "Danger, Will Robinson!" and our hero would spring into action.

Why does this have to do with recursive CTEs? Because as cool as recursive CTEs are, there exists the possibility of mistakenly creating a recursive CTE that results in an infinite loop. Not good. "Danger, Will Robinson!" This happens when a recursive member of the CTE returns the same value for both the parent and child columns. Bam, you're in an infinite loop.

Summary

OK, there you have it. CTEs in a nutshell. The intent of this article is to provide a helpful explanation and overview of common table expressions in SQL Server 2005 and show the true power and flexibility they provide with recursive query capability. We started out covering simple common table expressions and showed how much easier and clean they are to use over what you used to have to use in previous versions of SQL Server, and from there we discussed recursive CTEs and how to use them to retrieve hierarchical data. Very cool stuff. Try not to drool when you start using common table expressions. They have that affect.

Scott has been involved with databases for over ten years. He started using Microsoft Access prior to beta 1, then started using SQL Server when version 4.2 came out and has been using it heavily ever since. XML caught his attention about three years ago and most of his development since that time has been around XML and SQL Server. Scott is the author of the recently published Wrox book Professional SQL Server 2005 XML.

Copyright 2006 by WROX Publishing, Inc. All rights reserved. Reproduced here by permission of the publisher.



About the Author

Scott Klein

Scott Klein is the author of Professional SQL Server 2005 XML by (Wrox, 2006). He is passionate about anything and everything that has to do with SQL Server and XML, and related technologies, having worked with SQL Server for over 10 years and XML for over 3. He is currently building and putting together the best web site related to these technologies at http://www.sqlxml.com.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • On-demand Event Event Date: October 23, 2014 Despite the current "virtualize everything" mentality, there are advantages to utilizing physical hardware for certain tasks. This is especially true for backups. In many cases, it is clearly in an organization's best interest to make use of physical, purpose-built backup appliances rather than relying on virtual backup software (VBA - Virtual Backup Appliances). Join us for this webcast to learn why physical appliances are preferable to virtual backup appliances, …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds