I have a process where I am:
The read operation is taking approx. 5 minutes; the write operation is taking approx. 5 hours. So, yeah 😬😱😠
Here are some excerpts from the log:
Libname statement:
27 %libname_sqlsvr(libref=PROD,server=MYSERVER,port=,database=MYDB,schema=prod)
=================================================================================
LIBREF: PROD
CONNECT: NOPROMPT="Driver={SQL Server Native Client
10.0};Server=MYSERVER;Database=MYDB;Trusted_Connection=yes;"
INTERNAL OPTIONS: schema=prod bulkload=yes dbcommit=100000 direct_exe=delete preserve_names=yes
USER OPTIONS:
=================================================================================
ODBC: AUTOCOMMIT is NO for connection 3
MPRINT(LIBNAME_SQLSVR): libname PROD odbc NOPROMPT="Driver={SQL Server Native Client
10.0};Server=MYSERVER;Database=MYDB;Trusted_Connection=yes;" schema=prod bulkload=yes dbcommit=100000
direct_exe=delete preserve_names=yes ;
NOTE: Libref PROD was successfully assigned as follows:
Engine: ODBC
Physical Name:
Read data from SQL Server to SAS:
27 * Copy data from SQL Server ;
ODBC: AUTOCOMMIT is NO for connection 6
ODBC_28: Prepared: on connection 6
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC_ORIG"
28 data workspde.RLDX_DEMOGRAPHIC;
29 set prod.RLDX_DEMOGRAPHIC_ORIG;
30 run;
ODBC_29: Executed: on connection 6
Prepared statement ODBC_28
NOTE: There were 67,766,170 observations read from the data set PROD.RLDX_DEMOGRAPHIC_ORIG.
NOTE: The data set WORKSPDE.RLDX_DEMOGRAPHIC has 67,766,170 observations and 34 variables.
NOTE: Compressing data set WORKSPDE.RLDX_DEMOGRAPHIC decreased size by 75.73 percent.
NOTE: DATA statement used (Total process time):
real time 5:05.05
user cpu time 6:15.26
system cpu time 57.09 seconds
memory 1401.65k
OS Memory 22828.00k
Timestamp 25/03/2020 07:17:50 AM
Write data from SAS to SQL Server:
27 proc append
ODBC_24: Prepared: on connection 6
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC" WHERE 0=1
ODBC: AUTOCOMMIT is NO for connection 7
ODBC: COMMIT performed on connection 7.
ODBC_25: Prepared: on connection 7
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC"
28 base=prod.RLDX_DEMOGRAPHIC (
29 bulkload=yes bl_options=tablock
30 )
31 data=workspde.RLDX_DEMOGRAPHIC6 (
32 keep=
33 rldx_record_key
34 rldx_audit_key
35 recnum
36 stay_number
37 episode_sequence_number
38 hospital_type
39 source
40 hoscode
41 tfrhosp
42 transfrom
43 mrn
44 surname
45 givnames
46 sex
47 dob
48 age
49 address
50 locality
51 pcode
52 cob
53 admdate
54 sepdate
55 dthdate
56 notifdate
57 patientnum
58 supi
59 moh_auid
60 AHS_hosp
61 dc_extract_date
62 cob_sacc
63 state_of_usual_residence
64 rldx_valid_from
65 rldx_valid_to
66 rldx_current_ind
67 );
68 run;
NOTE: Appending WORKSPDE.RLDX_DEMOGRAPHIC6 to PROD.RLDX_DEMOGRAPHIC.
NOTE: There were 67,766,170 observations read from the data set WORKSPDE.RLDX_DEMOGRAPHIC6.
NOTE: 67766170 observations added.
NOTE: The data set PROD.RLDX_DEMOGRAPHIC has . observations and 34 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 5:34:15.35
user cpu time 18:04.33
system cpu time 37.90 seconds
memory 634.87k
OS Memory 21804.00k
Timestamp 25/03/2020 03:34:13 AM
The write operation is a TRUNCATE TABLE (not shown) and PROC APPEND approach.
I note this previous thread: https://communities.sas.com/t5/General-SAS-Programming/Get-data-into-sql-server-more-efficiently/td-...
I will continue playing with the libname and/or dataset options, but at 5 hours a pop for performance metrics, I thought I'd post to see if someone has further ideas. For example, is it possible that bulkloading needs to be tweaked on SQL Server itself (not that I'd know how to diagnose that, and I'm not the DBA).
It may help to be a bit specific as to the types of data changes you are making.
For instance, does every single row require a change? If not, can you extract just those and then update the source table.
The type of transform may be amenable to in database code and cut out the steps of download/upload which I suspect is taking a moderate amount of your time.
The code is doing about 7 distinct transformations to fix data errors from historic job runs.
The more complex transformations require knowing data from the previous row (eg. lag) and first. and last. processing. While doable in SQL Server, they are just easier in SAS.
For instance, does every single row require a change? If not, can you extract just those and then update the source table.
No, not every row required a change, but every row might require a change. But I don't know which rows require a change until I do the SAS processing, especially transformations that require logic across rows as described above.
I could review my code, determine which columns are actually updated, and refactor my code to only download those columns (and supporting columns), then upload that narrower table and run a series of UPDATE statements on SQL Server.
But programmer time is often more critical than machine time.
For the purpose of this post, I'm just interested in improving the write performance of SAS to SQL Server. I've built a test suite testing the dataset options bulkload=yes, bl_options=tablock, dbcommit=, insertbuff=, and comparing proc append to an existing table vs. a data step writing a new table. I won't have those metrics until the end of day if not overnight. But early results on a small subset of the data (10M rows) seems to indicate that increasing dbcommit= looks promising.
I'll post a summary of those results once I have them.
These aren't the results of my test suite, but preliminary results showed promise by increasing dbcommit=.
I re-ran my transformation processing, increasing dbcommit to 1B (1,000,000,000) from the previous value of 100K (100,000). 1B is much larger that the source dataset. AFAIK this change reduces transaction log overhead on SQL Server.
The load time reduced from 5 hours to 30 mins.
Excerpt from the log:
671 proc append
ODBC: AUTOCOMMIT is NO for connection 5
ODBC_6: Prepared: on connection 5
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC" WHERE 0=1
ODBC: AUTOCOMMIT is NO for connection 6
ODBC: COMMIT performed on connection 6.
ODBC_7: Prepared: on connection 6
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC"
672 base=prod.RLDX_DEMOGRAPHIC (
673 bulkload=yes bl_options=tablock dbcommit=1000000000
674 )
675 data=workspde.RLDX_DEMOGRAPHIC6 (
676 keep=
677 rldx_record_key
678 rldx_audit_key
679 recnum
680 stay_number
681 episode_sequence_number
682 hospital_type
683 source
684 hoscode
685 tfrhosp
686 transfrom
687 mrn
688 surname
689 givnames
690 sex
691 dob
692 age
693 address
694 locality
695 pcode
696 cob
697 admdate
698 sepdate
699 dthdate
700 notifdate
701 patientnum
702 supi
703 moh_auid
704 AHS_hosp
705 dc_extract_date
706 cob_sacc
707 state_of_usual_residence
708 rldx_valid_from
709 rldx_valid_to
710 rldx_current_ind
711 );
712 run;
NOTE: Appending WORKSPDE.RLDX_DEMOGRAPHIC6 to PROD.RLDX_DEMOGRAPHIC.
NOTE: There were 67,766,170 observations read from the data set WORKSPDE.RLDX_DEMOGRAPHIC6.
NOTE: 67766170 observations added.
NOTE: The data set PROD.RLDX_DEMOGRAPHIC has . observations and 34 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 30:38.34
user cpu time 18:12.94
system cpu time 41.20 seconds
memory 685.25k
OS Memory 22836.00k
Timestamp 25/03/2020 01:11:54 PM
I very often set dbcommit to zero meaning that there is only a single commit after all data is loaded. I'm doing this not only for performance but also because I very often prefer an all or nothing approach and though never have to deal with partially loaded tables.
If going for more than one commit then disabling indexes during the load process should also speed-up processing.
I don't know if this is applicable for your case, but in our project there was one libname option that really made a difference:
CONNECTION=UNIQUE.
Not sure if this has effect on bulk load scenarios though.
You could ask the SQL DBA is they see anything interesting in their log.
Hi Scott - Did you figured out the performance issue pushing update from, SAS to SQL table having identical values almost 30K rows and 100+ columns !! I am having exactly same issue-thing is update does works if it's specific list of it took time and update but when I run for complete data set not sure it's been already 1 hours + and still going - any hint clue or some better alternative !
In my experience you need to change the INSERTBUFF and DBCOMMIT settings to improve table loading performance with SQL Server. Start with INSERTBUFF = 10000 and DBCOMMIT = 0, then try higher values of INSERTBUFF to see if it loads even faster.
As @SASKiwi suggested try with dbcommit=0 and insertbuff= to something significantly higher than the default of 1.
Which value for insertbuff performs best depends on your environment so test with a few different ones to find the soft spot for your environment and table.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.