Solutions for INSERT OR UPDATE on SQL Server

U

User1

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous. When multiple threads will try to perform Insert-or-update you can easily get primary key violation.

Solutions provided by @Beau Crawford & @Esteban show general idea but error-prone.

To avoid deadlocks and PK violations you can use something like this:

begin tran
if exists (select * from table with (updlock,serializable) where key = @key)
begin
   update table set ...
   where key = @key
end
else
begin
   insert into table (key, ...)
   values (@key, ...)
end
commit tran

or

begin tran
   update table with (serializable) set ...
   where key = @key

   if @@rowcount = 0
   begin
      insert into table (key, ...) values (@key,..)
   end
commit tran

Question asked for most performant solution rather than the safest. Whilst a transaction adds security to the process, it also adds an overhead.

Both these methods can still fail. If two concurrent threads do the same on the same row, the first one will succeed, but the second insert will fail because of a primary key violation. A transaction does not guaranty that the insert will succeed even if the update failed because the record existed. To guaranty that any number of concurrent transaction will succeed you MUST use a lock.

@aku any reason you used table hints ("with(xxxx)") as opposed to "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE" just before your BEGIN TRAN ?

@CashCow, the last wins, this is what INSERT or UPDATE is supposed to do: the first one inserts, the second updates the record. Adding a lock allow this to happen in a very short time-frame, preventing an error.

I always thought use locking hints are bad, and we should let Microsoft Internal engine dictate locks. Is this the apparent exception to the rule?

C

Community

See my detailed answer to a very similar previous question

@Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.

MS Sql2008 introduces merge from the SQL:2003 standard:

merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
    as source (field1, field2)
    on target.idfield = 7
when matched then
    update
    set field1 = source.field1,
        field2 = source.field2,
        ...
when not matched then
    insert ( idfield, field1, field2, ... )
    values ( 7,  source.field1, source.field2, ... )

Now it's really just one IO operation, but awful code :-(

@Ian Boyd - yeah, that's the SQL:2003 standard's syntax, not the upsert that just about all the other DB providers decided to support instead. The upsert syntax is a far nicer way to do this, so at the very least MS should have supported it too - it's not like it's the only non standard keyword in T-SQL

any comment on the lock hint in other answers? (will find out soon, but if it's the recommended way, I recommend adding it on the answer)

See here weblogs.sqlteam.com/dang/archive/2009/01/31/… for answer on how to prevent race conditions from causing errors that can occur even when using MERGE syntax.

@Seph that's a real surprise - somewhat of a fail by Microsoft there :-S I guess that means you need a HOLDLOCK for merge operations in high concurrency situations.

This answer really needs updated to account for the comment by Seph about it not being thread-safe without a HOLDLOCK. According to the linked post, MERGE implicitly takes out an update lock, but releases it before inserting rows, which can cause a race condition and primary key violations on insert. By using HOLDLOCK, the locks are kept until after the insert occurs.

B

Beau Crawford

Do an UPSERT:

UPDATE MyTable SET FieldA=@FieldA WHERE Key=@Key

IF @@ROWCOUNT = 0
   INSERT INTO MyTable (FieldA) VALUES (@FieldA)

http://en.wikipedia.org/wiki/Upsert

Primary key violations should not occur if you have the proper unique index constraints applied. The whole point of the constraint is to prevent duplicate rows from every happening. It doesn't matter how many threads are trying to insert, the database will serialize as necessary to enforce the constraint... and if it doesn't, then the engine is worthless. Of course, wrapping this in a serialized transaction would make this more correct and less susceptible to deadlocks or failed inserts.

@Triynko, I think @Sam Saffron meant that if two+ threads interleave in the right sequence then sql server will throw an error indicating a primary key violation would have occurred. Wrapping it in a serializable transaction is the correct way to prevent errors in the above set of statements.

Even if you have a primary key that is a auto-increment, your concern will then be any unique constraints that might be on the table.

the database should take care of primary key issues. What you are saying is that if update fails and another process gets there first with an insert your insert will fail. In that case you have a race condition anyway. Locking won't change the fact that the post-condition will be that one of the processes that tries writing will get the value.

A

Aaron Bertrand

Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:

Use Caution with SQL Server's MERGE Statement

So, you want to use MERGE, eh?

Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):

BEGIN TRANSACTION;

UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE) 
  SET ... WHERE PK = @PK;

IF @@ROWCOUNT = 0
BEGIN
  INSERT dbo.table(PK, ...) SELECT @PK, ...;
END

COMMIT TRANSACTION;

Please stop using this UPSERT anti-pattern

A lot of folks will suggest this way:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;

IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = @PK)
BEGIN
  UPDATE ...
END
ELSE
BEGIN
  INSERT ...
END
COMMIT TRANSACTION;

But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)

Others will suggest this way:

BEGIN TRY
  INSERT ...
END TRY
BEGIN CATCH
  IF ERROR_NUMBER() = 2627
    UPDATE ...
END CATCH

However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:

Checking for potential constraint violations before entering TRY/CATCH

Performance impact of different error handling techniques

What about inserting/updating FROM a tem table which insert/update many records?

@user960567 Well,

UPDATE target SET col = tmp.col FROM target INNER JOIN #tmp ON <key clause>; INSERT target(...) SELECT ... FROM #tmp AS t WHERE NOT EXISTS (SELECT 1 FROM target WHERE key = t.key);

nice replied after more than 2 years :)

@user960567 Sorry, I don't always catch comment notifications in real time.

@iokevins No difference that I can think of. I’m actually torn in terms of preference, while I prefer having the hint at the query level, I prefer the opposite when we’re talking about, say, applying NOLOCK hints to every table in the query (in that case I much prefer a single SET statement to fix later).

M

Mitch Wheat

IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)

Edit:

Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.

I still like this one better. The upsert seems more like programming by side effect, and I have never seen the piddly little clustered index seek of that initial select to cause performance problems in a real database.

@EricZBeard It's not about performance (though it's not always a seek that you're performing redundantly, depending on what you're checking to indicate a duplicate). The real problem is the opportunity the additional operation opens up for race conditions and deadlocks (I explain why in this post).

E

Eric Weilnau

If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.

MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])

Check out Mimicking MERGE Statement in SQL Server 2005.

In Oracle, issuing a MERGE statement I think locks the table. Does the same happen in SQL*Server?

MERGE is susceptible to race conditions (see weblogs.sqlteam.com/dang/archive/2009/01/31/…) unless you make it hold certian locks. Also, take a look at MERGE's performance in SQL Profiler ... i find that it is typcially slower and generates more reads than alternative solutions.

@EBarr - Thanks for the link on the locks. I have updated my answer to include the suggest locking hint.

Also check out mssqltips.com/sqlservertip/3074/…

u

user243131

Although its pretty late to comment on this I want to add a more complete example using MERGE.

Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.

A very good example is given here: http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx

The above explains locking and concurrency scenarios as well.

I will be quoting the same for reference:

ALTER PROCEDURE dbo.Merge_Foo2
      @ID int
AS

SET NOCOUNT, XACT_ABORT ON;

MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT @ID AS ID) AS new_foo
      ON f.ID = new_foo.ID
WHEN MATCHED THEN
    UPDATE
            SET f.UpdateSpid = @@SPID,
            UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
    INSERT
      (
            ID,
            InsertSpid,
            InsertTime
      )
    VALUES
      (
            new_foo.ID,
            @@SPID,
            SYSDATETIME()
      );

RETURN @@ERROR;

There are other things to worry about with MERGE: mssqltips.com/sqlservertip/3074/…

D

Denver

/*
CREATE TABLE ApplicationsDesSocietes (
   id                   INT IDENTITY(0,1)    NOT NULL,
   applicationId        INT                  NOT NULL,
   societeId            INT                  NOT NULL,
   suppression          BIT                  NULL,
   CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/

DECLARE @applicationId INT = 81, @societeId INT = 43, @suppression BIT = 0

MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (@applicationId, @societeId, @suppression))
    AS source (applicationId, societeId, suppression)
    --here goes the ON join condition
    ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
    UPDATE
    --place your list of SET here
    SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
    --insert a new line with the SOURCE table one row
    INSERT (applicationId, societeId, suppression)
    VALUES (source.applicationId, source.societeId, source.suppression);
GO

Replace table and field names by whatever you need. Take care of the using ON condition. Then set the appropriate value (and type) for the variables on the DECLARE line.

Cheers.

S

Saleh Najar

That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.

After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.

UPDATE <tableName> SET <field>=@field WHERE key=@key;

IF @@ROWCOUNT = 0
BEGIN
   INSERT INTO <tableName> (field)
   SELECT @field
   WHERE NOT EXISTS (select * from tableName where key = @key);
END

R

RamenChef

You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.

MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`

@RamenChef I don't understand. Where are the WHEN MATCHED clauses?

@likejudo I didn't write this; I only revised it. Ask the user that wrote the post.

K

Kristen

If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)

INSERT INTO MyTable (Key, FieldA)
   SELECT @Key, @FieldA
   WHERE NOT EXISTS
   (
       SELECT *
       FROM  MyTable
       WHERE Key = @Key
   )
IF @@ROWCOUNT = 0
BEGIN
   UPDATE MyTable
   SET FieldA=@FieldA
   WHERE Key=@Key
   IF @@ROWCOUNT = 0
   ... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END

Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.

Using MERGE probably preferable for SQL2008 onwards.

Interesting idea, but incorrect syntax. The SELECT needs a FROM , and a TOP 1 (unless the chosen table_source has only 1 row).

Thanks. I've changed it to a NOT EXISTS. There will only ever be one matching row because of the test for "key" as per O/P (although that may need to be a multi-part key :) )

b

bjorsig

MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.

I have never needed to use a cursor to do this with large datasets. You just need an update that updates the records that match and an insert with a select instead of a values clause that left joins to the table.

B

Bo Persson

Does the race conditions really matter if you first try an update followed by an insert? Lets say you have two threads that want to set a value for key key:

Thread 1: value = 1 Thread 2: value = 2

Example race condition scenario

key is not defined Thread 1 fails with update Thread 2 fails with update Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1 The other thread fails with insert (with error duplicate key) - thread 2. Result: The "first" of the two treads to insert, decides value. Wanted result: The last of the 2 threads to write data (update or insert) should decide value

But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.

When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.

For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.

Also, one should of course never assume that value in the database is the same as the value you wrote last.

Z

ZXX

Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).

Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.

Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.

If you just do an update then insert without any locking or elevated isolation, then two users could try to pass the same data back (I wouldn't consider it a bug in the middle tier if two users tried to submit the exact same information at the same time - depends a lot on context, doesn't it?). They both enter the update, which returns 0 rows for both, then they both try to insert. One wins, the other gets an exception. This is what people are usually trying to avoid.

M

Mike Chamberlain

I had tried below solution and it works for me, when concurrent request for insert statement occurs.

begin tran
if exists (select * from table with (updlock,serializable) where key = @key)
begin
   update table set ...
   where key = @key
end
else
begin
   insert table (key, ...)
   values (@key, ...)
end
commit tran

V

Victor Sanchez

You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE

    BEGIN TRAN

    UPDATE table
    SET Id = @ID, Description = @Description
    WHERE Id = @Id

    INSERT INTO table(Id, Description)
    SELECT @Id, @Description
    WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = @Id)

    COMMIT TRAN

NOTE: Please explain answer negatives

I am guessing lack of locking?

No lack locking... I use "TRAN". Default sql-server transactions have locking.

N

Nenad

Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION

    IF (EXISTS (SELECT * FROM myTable WHERE key=@key)
        UPDATE myTable SET ...
        WHERE key=@key
    ELSE
        INSERT INTO myTable (key, ...)
        VALUES (@key, ...)

COMMIT TRANSACTION

This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=@key) while currently running transaction is open. On the other hand, operations on another row won't be blocked (WHERE key=@key2).

E

Eugene Kaurov

MySQL (and subsequently SQLite) also support the REPLACE INTO syntax:

REPLACE INTO MyTable (KEY, datafield1, datafield2) VALUES (5, '123', 'overwrite');

This automatically identifies the primary key and finds a matching row to update, inserting a new one if none is found.

Documentation: https://dev.mysql.com/doc/refman/8.0/en/replace.html

m

marc_s

In SQL Server 2008 you can use the MERGE statement

this is a comment. in the absence of any actual example code this is just like many other comments on the site.

Very old, but an example would be nice.

J

Jay

You can use:

INSERT INTO tableName (...) VALUES (...) 
ON DUPLICATE KEY 
UPDATE ...

Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

L

Luke Bennett

Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:

DECLARE @RowExists bit
SET @RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', @RowExists = 1 WHERE Key = 123
IF @RowExists = 0
  INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')

M

Micky McQuade

I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.

FirstSP:
If Exists
   Call SecondSP (UpdateProc)
Else
   Call ThirdSP (InsertProc)

Now, I don't follow my own advice very often, so take it with a grain of salt.

This may have been relevant in ancient versions of SQL Server, but modern versions have statement-level compilation. Forks etc. are not an issue, and using separate procedures for these things does not solve any of the issues inherent in making the choice between an update and an insert anyway...

n

nruessmann

If you use ADO.NET, the DataAdapter handles this.

If you want to handle it yourself, this is the way:

Make sure there is a primary key constraint on your key column.

Then you:

Do the update If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.

You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.

...and doing the insert first (knowing that it will fail sometimes) is expensive for SQL Server. sqlperformance.com/2012/08/t-sql-queries/error-handling

C

Clint Ecker

Do a select, if you get a result, update it, if not, create it.

That's two calls to the database.

I don't see a problem with that.

It's two calls to the DB that's the problem, you end doubling the number of roundtrips to the DB. If the app hits the db with lots of inserts/updates it'll hurt performance. UPSERT is a better strategy.

it also creates a race condition no?

Solutions for INSERT OR UPDATE on SQL Server

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US