BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Hi, I need to load several tables from SAS (on unix) to a database on MS SQL Server. The upload time is very long which takes more than 4 hours to load about 870 thousand rows. Some of my tables have more than 3 million rows which takes 18 hours. 

 

I have two questions: 1) Why it takes so long to load data from SAS (on unix) to MS SQL server? More importantly 2) What would be the ways to improve the load time?  

 

I realized bulkload is not available in unix and I tried compress = Yes in the data step but it seems to not work well either. I need to improve the processing time and appericated for any advices. Below are the codes and the logs. 

 

LIBNAME db ODBC UID=&sqluid PWD=&sqlpwd DSN=WH SCHEMA=Inv_info ; /* this is the database on SQL server*/

LIBNAME me '/Shared/Inv1/users/tom/monthly/'; /* my own user folder which contains all the tables to be loaded into SQL server*/

 

DATA DB.wh_inv1;
SET ME.wh_inv1_test;
RUN;

 

 

NOTE: Libref ME was successfully assigned as follows:
Engine: V9
Physical Name: /Shared/Inv1/users/tom/monthly
24 LIBNAME db ODBC UID=&sqluid PWD=&sqlpwd DSN=WH SCHEMA=Inv_info ;
NOTE: Libref DB was successfully assigned as follows:
Engine: ODBC
Physical Name: WH


NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 871607 observations read from the data set ME.wh_inv1_test.
NOTE: The data set DB.wh_inv1 has 871607 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 4:25:42.61
cpu time 2:05.47

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

From UNIX I thought the SQLSRV engine was the standard to use.  It could be faster than the generic ODBC engine.

 

Also could could play with the INSERTBUFF option.

 

It's possible that it's faster to insert records into an existing table.  I usually create a table with 0 records, then use PROC APPEND to add to it.  

 

Also consider the networking involved.  That is, if you database server is far from the SAS server, you can spend a lot of time just passing data over the wire.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

13 REPLIES 13
Quentin
Super User

From UNIX I thought the SQLSRV engine was the standard to use.  It could be faster than the generic ODBC engine.

 

Also could could play with the INSERTBUFF option.

 

It's possible that it's faster to insert records into an existing table.  I usually create a table with 0 records, then use PROC APPEND to add to it.  

 

Also consider the networking involved.  That is, if you database server is far from the SAS server, you can spend a lot of time just passing data over the wire.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Quentin, creating an empty table then use proc append, is it something like below?

I am going to try this method and I am curious why doing so will improve performance? 

Thanks again for your advices.

 

data spdslib.cars;

set somelib.cars(obs=0);

run;

 

PROC APPEND

base=spdslib.cars data=somelib.cars;

run;

Quentin
Super User

Yes, that's what I often do.  I can't promise it will be faster.   I do more reading from SQL server than writing to it.  But I thought SAS might have some better bulk loading options when you append to an existing table.   Definitely play with insertbuff as well, e.g.

 

data spdslib.cars;
  set somelib.cars(obs=0);
run;
 
PROC APPEND base=spdslib.cars (insertbuff=1000) data=somelib.cars;
run;

If you really want to speed it up, ideally you'll have a DBA who wants to work with you and can look at what's happening on the database side, and also look at network traffic.  I've had some luck in finding DBA's who want to do that sort of investigation for a me as a SAS user, but it's rare. : )

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Quentin, that sounds great. I tried the insertbuff=10000 and amazingly the load time improved to 1 minutes from 4 hours. 

Ksharp
Super User
proc sql;
create table spdslib.cars like somelib.cars ;
quit;
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Ksharp, I wonder why this approach could reduce the load time? It is like create a table from an existing table by copying the existing table's column, and the new table will be populated with the records from the existing table? 

Ksharp
Super User
I don't know. But insertbuff=10000 definitely will enhance efficient .
SASKiwi
PROC Star

Try adding this to your ODBC LIBNAME statement: insertbuff = 10000

 

If this helps improve loading performance, then try higher or lower numbers to get the optimum value.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks SASKiwi, I am trying this method now.

 

I did some researches and learned that by default SAS is inserting 1 row at a time into a third party relational database, by adding the inserbuff option will improve ODBC libraries dramatically. 

 

I wonder how does the number for insertbuff (i.e 10000 vs any higher or lower number) matter? Does higher number means faster and vice versa?

Quentin
Super User

You really have to play with insertbuff to try to find an optimal setting.  My understanding is there are trade-offs in memory usage, etc.  So it's not always the case that the biggest insertbuff is optimal.  But definitely insertbuff=1 is slow.  I think some drivers might allow insertbuff=0 for SAS to try to guess at a good value.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks. It worked! Sorry that I have one more question which relates to create an empty table and append row. Previously I had syntax error shown as below saying the ODBC table already exist, but in fact, that is a table never exist in the DB on Sql Server. I fixed this by deleting the table and it worked, but I don't understand why it says the table already exist (which is not true) and why do we need to delete a table before loading that table into a db on SQL Server?

Thanks again for your advices. 

 

ERROR: The ODBC table INVENTORY2 has been opened for OUTPUT. This table already exists, or there is a name conflict with an existing
object. This table will not be replaced. This engine does not support the REPLACE option.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.01 seconds

Quentin
Super User

From 4 hours to 1 minute is a nice improvement! : )

 

In SAS, we are used to over-writing datasets, so:

data new;
  set old;
run;

will happily overwrite work.new without mentioning anything.

 

Databases don't do a lot of table creation and re-creation.  They're generally about reading and writing data to existing tables.  If a table exists, and you try to create a new table with the same name, they don't like it.  I think it's a defense against accidentally losing an entire table of data.  

 

If you use:

data sql.new;
  set old;
run;

And the SQL libref points to a sql server database that already has a table named NEW, the code will error.  It will not delete the table for you automatically.

 

You can delete the table yourself.  Generally I find when I want to delete a SQL server table from SAS, it's fastest to use explicit pass-through to execute the drop table statement, e.g.:

 

proc sql;
  connect using sql;  *libref SQL points to sql server;
  execute by sql 
  (
   drop table new ;
  );
quit;

You can also  use the delete command to empty all records from a table (while leaving the empty table in place), which is useful for testing load times, etc.

 

proc sql;
  connect using sql;
  execute by sql
  (
   delete from foo;
  );
quit;
BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Ah I see. Thanks Quentin for all these useful information. 🙂 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3395 views
  • 5 likes
  • 4 in conversation