Using T-SQL CROSS APPLY and OUTER APPLY

Introduction

Part of my self-imposed job has always been to learn and adopt new technologies. As part of that process, I share what I learn by writing about it. The amount of information and content that is produced each year is much, much than anyone person can master, so I choose. Generally, I focus on OO technologies and languages, UML, design patterns, refactoring, and SQL. As a general rule, these areas have worked out pretty well for me. Unfortunately, the trade offs are that I may never master Ruby or really understand the difference between Ruby and Ruby on Rails. I am at peace with this decision.

That said, even though my personal focus is pretty much OO, I still miss stuff. SQL Server 2005 came out with the CROSS APPLY and OUTER APPLY operators and I have just started learning how to use APPLY in the last month or so. When I think I have it figured out, I am fortunate enough that some of you are interested in reading about my understanding of the technology.

From the MSDN help "the APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table acts as the left input." Hunh?! Technically, I understand table-valued function, left and right input, but this explanation doesn't really tell me about why I need this or when it's needed. The help documentation doesn't tell me what to do with APPLY. So, this article represents me processing and compartmentalizing APPLY, which will help me know when it should be pulled out of my toolbox and used.

This article is not a comprehensive definition of APPLY nor does it contain all scenarios. Really, the article represents one scenario, but one that is especially useful to me.

Understanding Single Table Hierarchies

A common problem is representing hierarchies of data in a database. A solution is to put all data in a hierarchy in a single table—like employees and managers—and have establish the relationships with keys. For example, AdventureWorks does this with the HumanResources.Employee and Person.Contact tables. Employee contains employees, which in turn includes managers and represents the hierarchical information, and Contact contains information such as the Employee's first and last name.

A classic way to represent a hierarchy is with genealogical relationships—mom and dad, grandpa and grandma, daughters and sons. For instance, any person has one set of parents, so two keys—on for mom and one for dad—and any row representing a person has two keys referring to that person's parents. Siblings would have identical parent identifiers, step-siblings would have at least one parent key in common, and aunts, uncles, cousins, and so forth could also be figured out.

To generate CROSS APPLY and OUTER APPLY, you can use the script in Listing 1 to create a simple table containing a single person with additional keys to that person's parents. Copy and paste the code into Microsoft SQL Server Management Studio (the Expression version of the same) and run the script.

Listing 1: A table named People that has columns that refer to other rows in the same table; the other rows represent "parents".

Use Genealogy
Go

--Create People table and insert values.
CREATE TABLE People
(
   PersonID   int         NOT NULL,
   MotherID   int         NULL,
   FatherID   int         NULL,
   Name       varchar(25) NOT NULL,
   CONSTRAINT PK_People PRIMARY KEY(PersonID),
)
GO

INSERT INTO People VALUES(1  , NULL, NULL, 'Jack Symons')
INSERT INTO People VALUES(2  , NULL, NULL, 'Marvel Symons')
INSERT INTO People VALUES(3  , NULL, NULL, 'Anna Kimmel')
INSERT INTO People VALUES(4  , NULL, NULL, 'Frank Kimmel')
INSERT INTO People VALUES(5  , 2, 1,       'Jacqueline Benavides')
INSERT INTO People VALUES(6  , 3, 4,       'Gerald Kimmel')
INSERT INTO People VALUES(7  , 5, 6,       'Kathy Hemenway')
INSERT INTO People VALUES(8  , 5, 6,       'Daniel Kimmel')
INSERT INTO People VALUES(9  , 5, 6,       'David Kimmel')
INSERT INTO People VALUES(10 , 5, 7,       'Robert Benavides')
INSERT INTO People VALUES(11 , 5, 7,       'Nicholas Benavides')
INSERT INTO People VALUES(12 , 5, 6,       'James Kimmel')
INSERT INTO People VALUES(13 , 5, 6,       'Paul Kimmel')
INSERT INTO People VALUES(14 , NULL, NULL, 'Lori Kimmel')
INSERT INTO People VALUES(15 , NULL, NULL, 'David Benavides')
INSERT INTO People VALUES(16 , 14, 13,     'Alex Kimmel')
INSERT INTO People VALUES(17 , 14, 13,     'Noah Kimmel')

GO

If you mapped this out, one branch of the hierarchy would show Gerald Kimmel and Jacqueline Benavides as my parents. My PersonID is 13, my MotherID is 5, and FatherID is 6. Everyone with 5 and 6 for parent IDs are my siblings.

The challenge is this: "How can you write a single query that will correctly return a hierarchy of relationships against a single table?"

Using CROSS APPLY

It is worth stating that child, mother, and father can be retrieved based on the People table by using a self join (see Listing 2). By joining People.MotherID on People.PersonID and People.FatherID on People.PersonID, you can build the hierarchy in the preceding scenario.

Listing 2: Use multiple self joins to build the hierarchy results.

select p1.Name as MyName, p2.Name AS Mother, p3.Name As Father
from people p1 left join people p2 on p1.MotherID =
   p2.PersonID left join people p3 on p1.FatherID = p3.PersonID

However, there are limitations. You can't use a table valued function passing in an outer query parameter, and you can't use a nested subquery that returns multiple rows. For example, given a function GetParents (see Listing 3) that accepts PersonID and returns mom and dad information, you cannot use the outer query PersonID to invoke the function (see Listing 4).

Listing 3: A table-value function that returns mom and dad information.

USE [Genealogy]
GO
/****** Object:  UserDefinedFunction [dbo].[GetParents]
        Script Date: 01/22/2009 14:29:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[GetParents](@PersonID int)
RETURNS @Parents TABLE
(
   [PersonID] [int] PRIMARY KEY NOT NULL,
   [Self] [varchar](25),
   [Mother] [varchar](25) NULL,
   [Father] [varchar](25) NULL
)
AS
BEGIN
   INSERT INTO @Parents
   SELECT
      p1.PersonID,
      p1.Name AS [Self],
      p2.[Name] AS Mother,
      p3.[Name] AS Father FROM
      People p1 INNER JOIN People p2 ON
      p1.MotherID = p2.PersonID INNER JOIN
      People p3 ON p1.FatherID = p3.PersonID
   WHERE
      p1.PersonID = @PersonID;
   RETURN;
END;

Listing 4: This code won't work because the outer query value cannot be passed to the function.

select p1.PersonID, p1.Name, GetParents(p1.PersonID)
FROM People p1

There are a lot of reasons you might prefer a function—the function already exists, code reuse, and isolation. Here is where CROSS APPLY shines. In the presence of a function, you can use CROSS APPLY to invoke the GetParents function, passing in the PersonID from the select to the function (see Listing 5).

Listing 5: CROSS APPLY returns the same data as the multiple inner joins and call the function with a value from the select statement.

select p1.PersonID, p1.Name, p2.Mother, p2.Father
FROM People p1
CROSS APPLY GetParents(p1.PersonID) p2

Listing 5 returns the hierarchy where there is a father and mother. If you change CROSS APPLY to OUTER APPLY, you will get rows without parents. In short, CROSS APPLY responds similarly to an INNER JOIN and OUTER APPLY responds similarly to a LEFT JOIN. Listing 6 contains a complete solution with CROSS APPLY and no function, and Listing 7 shows the OUTER APPLY with no function.

Listing 6: Assembling the self, mother, and father data where there are mothers and fathers using CROSS APPLY.

SELECT p1.PersonID, p1.[Name], M.Name as Mother,
   F.Name As Father FROM PEOPLE p1
CROSS APPLY
(SELECT p2.PersonID, p2.[Name] FROM PEOPLE p2
   WHERE p1.MotherID = p2.PersonID) M
CROSS APPLY
(SELECT PersonID, [Name] FROM PEOPLE p3
   WHERE p1.FatherID = p3.PersonID) F

Listing 7: Assembling the self, father, and mother data where either the father and/or mother data is null using OUTER APPLY.

SELECT p1.PersonID, p1.[Name], M.Name as Mother,
   F.Name As Father FROM PEOPLE p1
OUTER APPLY
(SELECT p2.PersonID, p2.[Name] FROM PEOPLE p2
   WHERE p1.MotherID = p2.PersonID) M
OUTER APPLY
(SELECT PersonID, [Name] FROM PEOPLE p3
   WHERE p1.FatherID = p3.PersonID) F

Because there is no function call in Listings 6 and 7, you could implement the same result set using multiple INNER JOINs in Listing 6 and multiple LEFT JOINs in Listing 7. If you introduced a function, the joins will not work; use APPLY instead.

Summary

The big picture here is that values from one query can't be used as parameters to join queries or sub-queries if those queries return more than one result. If you need data from one query as input to a function or want to allow multiple rows to return, you want to use APPLY. Use CROSS APPLY to get only rows where a correlated value exists and use OUTER APPLY if you want to permit nulls in the result set.

About the Author

Paul Kimmel is the VB Today columnist for www.codeguru.com and has written several books on object-oriented programming and .NET. Check out his upcoming book LINQ Unleashed for C#, now available on Amazon.com and at fine bookstores everywhere. Look for his upcoming book Teach Yourself the ADO.NET Entity Framework in 24 Hours. Paul is a consultant to Developer Express, Inc. You may contact him for technology questions at pkimmel@softconcepts.com.

Copyright © 2009 by Paul T. Kimmel. All Rights Reserved.