We use SAS 9.4 on a Linux platform. All Studies share the same MS SQL DB Tables and I only want to overwrite the data for the Study currently being processed. So, I wrote a macro that:
Everything works beautifully. However, since we cannot create a MS SQL Server table from a SAS dataset using explicit pass through SQL, I have to create the temporary DB table using implicit pass through.
PROBLEM: Creating the temporary DB table using implicit pass through takes over 2 hours to load 5 million records. Is there a better way of doing this? Below is my macro.
Any help much appreciated 🙂
In addition to @Reeza 's suggestion adding the options UPDATEBUFF and DBCOMMIT to your STR LIBNAME statement can make a HUGE difference in data loading performance. The default values for these options are not very efficient in my experience. You will probably need to experiment with different values to get the best load times. I suggest you try values like 1,000, 5,000, 10,000 and so on, but test it on a subset of your 5 million rows to speed up the testing process.
Rather than PROC SQL SELECT try a PROC DATASETS COPY.
proc datasets lib=atr;
*copy over final to atr library;
copy in=work out=atr;
select final;
*rename final data set to temp;
change final=TEMP;
run; quit;
The SQL step is reading each line one at a time, where COPY will copy in bulk.
If you have the correct SAS license (SAS/ACCESS) you can also try SAS DBLOAD.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acreldb/n0o4e1x3ggqlxan12sokc2lbgxbh.htm
Thanks! I tested it on a million records and it provides a slight improvement over PROC SQL if paired with DBCOMMIT=10000 INSERTBUFF=10000 as suggested in the other reply.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 8:56.77
cpu time 18.00 seconds
NOTE: PROCEDURE SQL used (Total process time):
real time 9:15.28
cpu time 17.61 seconds
In addition to @Reeza 's suggestion adding the options UPDATEBUFF and DBCOMMIT to your STR LIBNAME statement can make a HUGE difference in data loading performance. The default values for these options are not very efficient in my experience. You will probably need to experiment with different values to get the best load times. I suggest you try values like 1,000, 5,000, 10,000 and so on, but test it on a subset of your 5 million rows to speed up the testing process.
Thanks! I tried all three values on a million records and it appears DBCOMMIT=10000 INSERTBUFF=10000 provides the most improvement.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 8:56.77
cpu time 18.00 seconds
NOTE: PROCEDURE SQL used (Total process time):
real time 9:15.28
cpu time 17.61 seconds
@kc_dn - Woohoo! That's a massive improvement! Can't complain about loading a million rows in 9 minutes. Bulk loading might improve things further but if you are happy with the current load times I wouldn't bother.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.