Solved: Re: MSSQL update on duplicate key

val_nikolajevs · Posted 08-06-2022 05:41 PM

Hello I would like to ask how to implement efficiently update on duplicate key while inserting to MSSQL table.

The MSSQL table has unique index on ID and DateTime.

TimeIn (when writing to db) and v1-v4 are variables that may change.

SQLLIB.table1 - is preassigned library that configured on metadata server to connect to MSSQL server/db

EXP_1 contains data that is not in the sql table and the data that should be updated.

Writing to SQL table with out duplicates using the following code does not produce any errors; however, on the duplicate ID and DateTime values are not getting updated. SQL index properties for ignore duplicate values is set to True, which should process all records with exemption of duplicates.

writing to sql:

Proc sql;
insert into SQLLIB.table1
select ID,DateTime,TimeIn,var1,var2,var3,var4
from EXP_1;
quit;

The table is large is there a way to update records or delete records that match index on multiple columns?

I have isolated the records in exp_2 table that should be updated and tried to delete from the sql table however was not successful--too slow.

Using the following:

proc sql;
delete
from SQLLIB.table1
where EXISTS (select Distinct ID, DateTime from exp_2);
quit;

Also have tried solution that was suggested here:

https://communities.sas.com/t5/SAS-Enterprise-Guide/Delete-with-an-exists-statement-really-slow/td-p...

using catx on index columns still no result very slow. On the development just updating 2 records took 14 minutes. Moreover; during execution,I have tried to use MSSQL management studio and do very basic select top 1000 and it took 2 min, which should be a fraction of a second--used the same user login.

Which implies that this operation actually impacts MSSQL server for the deployment in production that may create operational issues.

Please let me know how I can configure and/or update code so duplicates are getting updated in the MS Sql table.

Thank you.

SASKiwi · Posted 08-07-2022 09:13 PM

Here's how you can do an SQL Passthru delete in SQL Server:

proc sql;
 connect using MySQLCon; 
   execute (delete from [MySQLDatabase].[MySQLSchema].[MySQLTable]
            where < where conditions >
           ) by MySQLCon;
   disconnect from MySQLCon;
quit;

Also will most likely need to add these options on your SQL Server data connection: DBCOMMIT = nnnnn and INSERTBUFF = nnnnn. Tuning the numbers on these options is a matter of trial and error, but numbers between 2,000 and 10,000 are typical ones to try. These can make a huge difference in performance.

View solution in original post

SASKiwi · Posted 08-06-2022 07:16 PM

In my experience the default settings in SAS for updating relational database tables are not optimal and often result in poor performance, so I understand your frustration. I can confirm though that once tuned, such processes can perform efficiently.

Just to clarify, what SAS database engine are you using? Is it SQL Server or ODBC? How many update rows and how many insert rows do you have in a typical SAS update job? If the number of update rows is small, then deleting them and reinserting them will be very efficient and simple to implement. If we are talking about millions of update rows, then updating in place is probably better.

You will get better control over the SQL Server update process by using SQL Passthru, rather than the implicit SQL you are currently using. Once you have answered the above questions I will have a better idea of the best processing options to suggest to you.

val_nikolajevs · Posted 08-06-2022 07:35 PM

Thank you for reply.

Right now it is preassigned library using ODBC engine. Trying to use metadata configuration, so I do not expose db access credentials inside open code that will be maintained by regular users.

Code inserting and updating is done under service domain account (which should not be visible to another users).

Insert rows about 30,000 -50,000 each day.

Update rows really depends on upstream data sources--min would be zero and maximum could be approximately 30,000--one year of data for unique ID.

How do I set up pass through?

Do I need to set DNS and what type authentication to use on sql server inside the code?

Once again thank you.

SASKiwi · Posted 08-06-2022 11:32 PM

Passthru is easy to setup. You can use your existing metadata-defined data connection like this (I've named it MySQLCon - just replace that with your LIBREF) :

proc sql;
 connect using MySQLCon;
  execute(
< Put your MS SQL Server specific code here>
) by MySQLCon;
quit;

With up to 50,000 rows to update, I suggest you upload your SAS table first to an MS SQL Server temporary table, then run your deletes and inserts as above using Passthru. There are special settings for creating and keeping temporary SQL Server tables. I will post the details when I get back to work tomorrow.

SASKiwi · Posted 08-07-2022 09:13 PM

Here's how you can do an SQL Passthru delete in SQL Server:

proc sql;
 connect using MySQLCon; 
   execute (delete from [MySQLDatabase].[MySQLSchema].[MySQLTable]
            where < where conditions >
           ) by MySQLCon;
   disconnect from MySQLCon;
quit;

Also will most likely need to add these options on your SQL Server data connection: DBCOMMIT = nnnnn and INSERTBUFF = nnnnn. Tuning the numbers on these options is a matter of trial and error, but numbers between 2,000 and 10,000 are typical ones to try. These can make a huge difference in performance.

val_nikolajevs · Posted 08-08-2022 11:13 AM

Thank you very much. It worked deleted/updated about 30,000 records took less than a minute.

End up using the following:

*created records to be updated/deleted;

*number of obs to be updated if any;

data _NULL_;
if 0 then set exp_2 nobs=n;
call symputx('obs',n);
stop;
run;
%put &obs;

%if &obs>0 %then %do;

Proc sql;
insert into SQLLIB.table2
select ID,DateTime,TimeIn,var1,var2,var3,var4
from EXP_2;
quit;

quit;
%end;

**delete records using pass through:

proc sql;
connect using SQLLIB;
Execute(delete t1
from [DB].[schema].[table1] as t1 join [DB].[schema].[table2] as t2 on t2.TimeIn=t1.TimeIn and t1.ID=t2.ID and t1.DateTime=t2.DateTime) by SQLLIB;
disconnect from SQLLIB;
quit;

Word of caution: please test with select before using delete on actual data base table--end up wiping out the whole table1 in SLQ DB while testing.

MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Re: MSSQL update on duplicate key

Registration is open

SAS Training: Just a Click Away