SELECT, INSERT, UPDATE with the SQL MERGE Statement

Introduction

Business Intelligence Developers call it a slowly
changing dimension. Database developers call it an
UPSERT. Whether it is a slowly changing
dimension or an UPSERT, the basic pattern
remains the same: UPDATE data if the key exists
and if the key doesn’t exist INSERT the data.
It’s tedious code to write, tricky code to maintain, and
slow to execute. However, starting with SQL Server 2008, the
pattern has been encapsulated in the new MERGE
Statement. I’m going to demonstrate how and when to use the
new TSQL MERGE statement. I’m going to begin
with some background on UPSERTs and slowing
changing dimensions.

UPSERTS and Slowly Changing Dimensions

As I mentioned earlier, UPSERTS and Slowly
Changing Dimensions are terms for adding or alternatively
updating a record in a database depending on whether the
record does or does not exist. Typically, the solution
algorithm works something like this:


  1. Lookup a record using a table key

  2. If the key value is missing, insert the a new record

  3. If the key value exists, update the record

For a table like the one below:


CREATE TABLE [dbo].[ToMergeTable](
[KeyField] [int] NOT NULL,
[Field1] [nvarchar](50) NOT NULL,
[Field2] [nvarchar](50) NOT NULL,
[UpdateDateField] [datetime] NOT NULL,
CONSTRAINT [PK_ToMergeTable] PRIMARY KEY CLUSTERED
(
[KeyField] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

The TSQL code may look like the code below.
IF EXISTS (SELECT * FROM ToMergeTable WHERE KeyField = @KeyField)
BEGIN
UPDATE ToMergeTable SET UpdateDateField = GETDATE() WHERE KeyField = @KeyField
SELECT ‘Did UPDATE’
END
ELSE
BEGIN
INSERT INTO ToMergeTable (KeyField,Field1,Field2,UpdateDateField) VALUES (@KeyField,@Field1+’-1′,@Field2+’-1′,GETDATE())
SELECT ‘Did INSERT’
END


There is nothing technically wrong with the statement
above, but there are a number of shortcomings in the
solution.

First, the SELECT statement is separated
from the INSERT and UPDATE
statement. Databases usually have concurrent users. In the
time between executing the SELECT and, for
example, the INSERT statement, another user may
have added a new record a fraction of a second after the
SELECT statement executes making one of the
INSERTS fail.

Also, the code above requires two round trips to the
database; one trip to find the record, the
SELECT statement and a second trip to execute
the appropriate DML (UPDATE or
INSERT). In addition the code only works on one
record at a time.

I could make some coding improvements, but no matter what
improvements I make two statements must execute. For
example, the code can be modified to use a temporary table
and a correlated subquery to INSERT if the
temporary table data is missing and UPDATE
WHERE
the temporary table data matches something in
the existing table.

Again, the coding improvements are technically correct
and more flexible. However, coding a subquery often
sacrifices readability, plus, as I mentioned earlier, there
is no avoiding executing at least two statements and coding
more than one statement means two WHERE clauses
to maintain if the key happens to change.

With a non-MERGE example and a MERGE scenario, I’m going
to explain how the solution improves with the MERGE
statement.

MERGE Overview

Below is the solution algorithm above implemented using
the MERGE statement.


MERGE INTO ToMergeTable AS T
USING (VALUES
(@KeyField,@Field1+’-1′,@Field2+’-1′,GETDATE())
,(@KeyField+1,@Field1+’-2′,@Field2+’-2′,GETDATE())
,(@KeyField+2,@Field1+’-3′,@Field2+’-3′,GETDATE())
) AS S (KeyField,Field1,Field2,UpdateDate)
ON T.KeyField = S.KeyField
WHEN MATCHED THEN –UPDATE
UPDATE SET UpdateDateField = UpdateDate
WHEN NOT MATCHED THEN–INSERT
INSERT (KeyField,Field1,Field2,UpdateDateField) VALUES (S.KeyField,S.Field1,S.Field2,UpdateDate)
OUTPUT $action AS ChangesMade;

Later in the article I’ll explain what each part of the
statement does, first some general thoughts. The entire
solution is encapsulated in one statement. So, there is no
separation between statements and no need to run more than
one statement. Readability and maintainability improves.
Moreover, MERGE exists at a slightly higher
level of abstraction than the SELECT,
INSERT, UPDATE. It’s no mystery
that higher levels of abstraction often lead to speedier
development and better code maintenance. A single statement
means the code executes in the implicit transaction. Two
separate statements may have required a
BEGIN/COMMIT/ROLLBACK transaction.

Now I’m going to show how to compose the statement above
and explain variations on each section of the statement.

More by Author

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Must Read