06-16-2015 06:45 AM
I have in the past few months started to look at the optimisation and performance of our SAS DI Studio created processes. Our source / destination data tables are SQL server OLEDB connections.
I'm currently looking at a Table Loader, that is loading 67,000,000 records via the Replace load Style > All Rows Using Truncate > Append (Proc Append) (Please see attached screen shot Load Technique.jpg)
This took 7 hours to complete on our test environment (which has about 50% less resources than production).
Even though 67,000,000 records is a large volume, I was expecting this to run quicker than 7 hours (maybe 3-4hours) - as some smaller tables (3-4 million records) complete within 10mins.
The big observation I have about this is that the destination table contains 4 indexes and no keys.
Just wondering if anybody has any ideas whether they think the 7 hour runtime is to be expected? And if not is there any improvements we could make to the table loader / table / library to improve performance?
06-17-2015 03:09 PM
It may be a good idea to make sure that the TRUNCATE statement is being passed to MS SQL. I am not entirely sure that TRUNCATE can be passed to MS SQL via a SAS option. You can put the statement into an explicit pass-through statement. If TRUNCATE isn't used a DELETE statement is used. This means it is logged by the database and will be slow.
Is the BULKLOAD= option set to "yes" on your LIBNAME statement? If it isn't, that is where I would start. Once you have that set start focusing on indexes. Loading tables with indexes defined on them can be an issue with MS SQL. You may want drop the indexes prior to running your load.
I hope this helps.
06-17-2015 05:24 PM
Perhaps you should try creating a flat file of your test file, and testing the loading of that file to SQL Server. This would give you a chance to i) get an idea how long SQL Server needs natively to do the load, and ii) experiment with dropping and recreating indexes, etc.
Once you have an optimized load scenario, you can work on producing the same scenario with SAS. As says, using BULKLOAD is probably critical. SAS transfers to DBMS tables without it tend to be excruciatingly slow.
06-17-2015 05:40 PM
"Our source / destination data tables are SQL server OLEDB connections."
Does above statement mean both source and target table reside on SQL Server? If so and based on the elapsed times for your smaller 3-4M tables the very first thing I would be looking into is that processing happens fully within the data base and that you're not transferring data from the database to SAS and back.
Do you know below options?
OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;
If you really need to pull data into SAS and then load it from there then using bulk load is certainly a good option.
Without bulk load and using implicit SQL: There are a few libname options you might want to look into like "dbcommit" and "insertbuff"
06-18-2015 09:11 AM
If your source and destination tables are in MS SQL, specifically the same MS SQL server, then your strategy will change. Pulling data out of the database then putting it back in should be avoided if at all possible. Take a look at the DBIDIRECTEXEC system option or the DIRECT_EXE= LIBNAME statement option. These options push CREATE TABLE AS and DELETE processing to the database.
Having source and destination tables in a database means that you have to optimize both sides of the process - a SQL query and a load (or insert). In your case, this means that you have a third thing to worry about - TRUNCATE. If SAS performs the DELETE it will be slow. It will pull data into SAS then issue the DELETE one row at a time. If you use DBIDIRECTEXEC the DELETE will be pushed to MS SQL. This is better, but the DELETE statements are still logged. The best way to do this is TRUNCATE.
The key here is to know what is actually happening in the database. The SASTRACE statement that mentioned is a great way to see the SQL that SAS is passing to the database. I don't know how this works in DI Studio. You may be able to have your DBA monitor the database and tell you if TRUNCATE is being executed. If it isn't you may want to consider issuing the command via explicit pass-thru.
You may find this paper useful. It doesn't specifically cover loading, but you may find it useful.