Hello,
we have a quite large dataset which we generate in SAS but need to store as MS SQL table. The dataset has 39 columns and around 140 million rows. Generating it in SAS takes around 40 minutes, which is long but manageable. Loading it to SQL via ODBC bulkload, however, takes close to 3 hours. We are currently dropping the SQL table and creating it fresh via proc sql drop table/create table. Since most of the data does not change from run to run, could performance be gained by doing an incremental update? If so, how one would approach this? Running proc compare is unlikely an option.
Hi,
It may be quicker to create your delta on SAS and then apply it to the MS SQL table instead of always dropping and rebuilding the entire table.
You can see an example of update/insert code here Solved: SQL Server pass through (merge, update, insert) - SAS Support Communities
Regards
There are several things that can affect you performance:
- SQL Server resources
- SAS server resources
- Bandwidth
- Method of detecting changes. If you have/can create an update timestamp or similar, I would upload the subset to SQL Server, and perform UPDATE/INSERT or MERGE there. Make sure you have relevant indexes for you key.
Is it possible to just do the entire operation through your ODBC connection to SQL Server, just using execute() statements to submit the native T-SQL? If so, that will almost certainly be your fastest option, as it avoids any moving of data back and forth.
Can you please answer below questions. We must know in order to propose something realistic.
1. How much memory is available to your SAS session? Please share the log output of below.
proc options group=memory;
run;
2. Do you already know the delta (update/insert) on the SAS side or do you have to work it out by comparing with the SQL table?
3. Are there also deletes? And do you know on the SAS side or do you have to work it out by comparing to the SQL table?
4. Is your SAS table already sorted by a unique key?
5. What's the size of the unique key? If multiple columns just the sum of the lengths of the SAS variables that constitute the key.
6. Are you allowed to store SAS tables permanently on disk or must all permanent tables reside on the SQL server side?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.