SELECT, INSERT, UPDATE with the SQL MERGE Statement

Introduction

Business Intelligence Developers call it a slowly changing dimension. Database developers call it an UPSERT. Whether it is a slowly changing dimension or an UPSERT, the basic pattern remains the same: UPDATE data if the key exists and if the key doesn't exist INSERT the data. It's tedious code to write, tricky code to maintain, and slow to execute. However, starting with SQL Server 2008, the pattern has been encapsulated in the new MERGE Statement. I'm going to demonstrate how and when to use the new TSQL MERGE statement. I'm going to begin with some background on UPSERTs and slowing changing dimensions.

UPSERTS and Slowly Changing Dimensions

As I mentioned earlier, UPSERTS and Slowly Changing Dimensions are terms for adding or alternatively updating a record in a database depending on whether the record does or does not exist. Typically, the solution algorithm works something like this:

  1. Lookup a record using a table key
  2. If the key value is missing, insert the a new record
  3. If the key value exists, update the record

For a table like the one below:

 CREATE TABLE [dbo].[ToMergeTable](
 	[KeyField] [int] NOT NULL,
 	[Field1] [nvarchar](50) NOT NULL,
 	[Field2] [nvarchar](50) NOT NULL,
 	[UpdateDateField] [datetime] NOT NULL,
  CONSTRAINT [PK_ToMergeTable] PRIMARY KEY CLUSTERED 
 (
 	[KeyField] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 ) ON [PRIMARY]
 
 The TSQL code may look like the code below.
 IF EXISTS (SELECT * FROM ToMergeTable WHERE KeyField = @KeyField)
 BEGIN
 	UPDATE ToMergeTable SET UpdateDateField = GETDATE() WHERE KeyField = @KeyField
 	SELECT 'Did UPDATE'
 END
 ELSE
 BEGIN
 	INSERT INTO ToMergeTable (KeyField,Field1,Field2,UpdateDateField) VALUES (@KeyField,@Field1+'-1',@Field2+'-1',GETDATE())
 	SELECT 'Did INSERT'
 END

There is nothing technically wrong with the statement above, but there are a number of shortcomings in the solution.

First, the SELECT statement is separated from the INSERT and UPDATE statement. Databases usually have concurrent users. In the time between executing the SELECT and, for example, the INSERT statement, another user may have added a new record a fraction of a second after the SELECT statement executes making one of the INSERTS fail.

Also, the code above requires two round trips to the database; one trip to find the record, the SELECT statement and a second trip to execute the appropriate DML (UPDATE or INSERT). In addition the code only works on one record at a time.

I could make some coding improvements, but no matter what improvements I make two statements must execute. For example, the code can be modified to use a temporary table and a correlated subquery to INSERT if the temporary table data is missing and UPDATE WHERE the temporary table data matches something in the existing table.

Again, the coding improvements are technically correct and more flexible. However, coding a subquery often sacrifices readability, plus, as I mentioned earlier, there is no avoiding executing at least two statements and coding more than one statement means two WHERE clauses to maintain if the key happens to change.

With a non-MERGE example and a MERGE scenario, I'm going to explain how the solution improves with the MERGE statement.

MERGE Overview

Below is the solution algorithm above implemented using the MERGE statement.

  MERGE INTO ToMergeTable AS T
  USING (VALUES
 	(@KeyField,@Field1+'-1',@Field2+'-1',GETDATE())
 	,(@KeyField+1,@Field1+'-2',@Field2+'-2',GETDATE())
 	,(@KeyField+2,@Field1+'-3',@Field2+'-3',GETDATE())
  ) AS S (KeyField,Field1,Field2,UpdateDate)
  ON T.KeyField = S.KeyField
  WHEN MATCHED THEN --UPDATE
 	UPDATE SET UpdateDateField = UpdateDate	
  WHEN NOT MATCHED THEN--INSERT
 	INSERT (KeyField,Field1,Field2,UpdateDateField) VALUES (S.KeyField,S.Field1,S.Field2,UpdateDate)	
  OUTPUT $action AS ChangesMade;

Later in the article I'll explain what each part of the statement does, first some general thoughts. The entire solution is encapsulated in one statement. So, there is no separation between statements and no need to run more than one statement. Readability and maintainability improves. Moreover, MERGE exists at a slightly higher level of abstraction than the SELECT, INSERT, UPDATE. It's no mystery that higher levels of abstraction often lead to speedier development and better code maintenance. A single statement means the code executes in the implicit transaction. Two separate statements may have required a BEGIN/COMMIT/ROLLBACK transaction.

Now I'm going to show how to compose the statement above and explain variations on each section of the statement.

SELECT, INSERT, UPDATE with the SQL MERGE Statement

INTO, USING, ON

INTO and USING define the destination and the source of the MERGE data. Code displaying this section of the statement appears below.

 MERGE INTO ToMergeTable AS T
  USING (VALUES
 	(@KeyField,@Field1+'-1',@Field2+'-1',GETDATE())
 	,(@KeyField+1,@Field1+'-2',@Field2+'-2',GETDATE())
 	,(@KeyField+2,@Field1+'-3',@Field2+'-3',GETDATE())
  ) AS S (KeyField,Field1,Field2,UpdateDate)
  ON T.KeyField = S.KeyField

INTO defines the MERGE statement destination or rather, the target table. In the sample code above the target table is ToMergeTable. To simplify some later code I assigned an alias "T" to the target table.

USING defines the MERGE source. Sources can be any table data, including views and functions returning a table. In the example, I utilized the new SQL 2008 VALUES statement. VALUES allows multiple sets of data. The sample code above contains three sets of data.

ON defines the database key relationship between the target and the source. As you'll see later in the article ON directs the Lookup, working a lot like the SELECT statement I used in the non-MERGE sample code.

WHEN MATCHED

WHEN MATCHED typically handles what the UPDATE statement and the IF statement handled in the non-MERGE example above. The WHEN MATCHED section of the statement appears below.

  WHEN MATCHED THEN --UPDATE
 	UPDATE SET UpdateDateField = UpdateDate

MERGE will execute the UPDATE statement inside of WHEN MATCHED if the key relationship defined in the ON section is true.

UPDATE statements and DELETE statements can appear in the WHEN MATCHED section of the MERGE statement. Only two WHEN MATCHED statements can appear in a MERGE statement. Also, WHEN MATCHED can include other search conditions. For example, a developer may want to look at the threshold of a numeric value and perform an UPDATE given one condition and a DELETE if the additional condition is not met.

WHEN NOT MATCHED

Typically, the WHEN NOT MATCHED section handles what the ELSE and the

INSERT

handled in the non-MERGE example above. The WHEN NOT MATCHED section of the code appears below.

  WHEN NOT MATCHED THEN--INSERT
 	INSERT (KeyField,Field1,Field2,UpdateDateField) VALUES (S.KeyField,S.Field1,S.Field2,UpdateDate)

If the BY keyword is not included in the WHEN NOT MATCHED section, BY TARGET is assumed. Only one WHEN NOT MATCHED BY TARGET is allowed. INSERTS can only appear in the WHEN NOT MATCHED BY TARGET section of the MERGE statement.

A BY SOURCE qualifier allows UPDATES and DELETES to appear in the WHEN NOT MATCHED section.

OUTPUT

OUTPUT defines a result set returned by the MERGE statement. The example simply returns whether an UPDATE or INSERT was executed. Because a developer can include multiple records in the MERGE statement, it might be helpful to know whether some existing data was an UPDATE or new data was added (INSERT). For example, in tables containing an IDENTITY key, a developer may want to cache the IDENTITY value of an INSERTED record.

Conclusion

Handling UPSERTs and slowly changing dimensions has been made easier with the advent of the new TSQL MERGE statement. The new MERGE statement improves readability, maintainability, and most of all executes faster than an equivalent implementation using SELECT, IF, UPDATE, INSERT statements.

Resources

UPSERT
SQL Server Books Online



About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server. Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security. Jeff is a Microsoft BizTalk MVP. Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • On-demand Event Event Date: October 29, 2014 It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this webcast, Gene Kim discusses these survey findings and shares woeful tales of artifact management gone wrong! Gene also shares examples of how high-performing DevOps …

Most Popular Programming Stories

More for Developers

RSS Feeds