06-26-2015 06:25 AM
Hello SAS Experts,
I am running a simple update query as shown below after removing all the indexes from the table. There is no where clause required as all values under this column need update.
update tablename set columnname='xxxxxxxxxxxxxxxxxxx';
For a table with 45 million records and 153 columns, the CPU time taken by query is 1 hour 51 minutes.(real time ranges from 7 to 10 hours).
Will below Data step be a better choice than above query in reducing the CPU time?
Data libnam.table1; set libnam.table1; columnname='xxxxxxxxxxxxxxxxxxx'; run;
If not , are there any techniques to reduce the CPU time for the above kind of update?
06-26-2015 06:47 AM
Simple answer: Yes. As you may gather from other posts, SQL in SAS is not well optimised for large dataset processing. Use the datastep method, preferably combine it with any other processing you have as well to reduce time/resource. Maybe also consider if it is worthwhile adding a default value to each row of the data, you are really just increasing the size, without adding anything. I.e. is it really necessary.
06-26-2015 06:50 AM
5 points that will help you:
#1 through #4 : Tune your storage (SSD, multiple disks in RAID to spread I/O load, multiple host buses and adapters)
#5: data lib2.want; set lib1.have; columnname = 'xxxxxxxxxxxxx'; run; * where lib1 and lib2 reside on physically separated paths (different disks or RAID groups). This prevents concurrent read and write on the same disk(s). proc sql; create table lib2.want as select a.*, 'xxxxxxxxxxx' as columnname from lib1.have a; quit; is equivalent.
For further tuning, set up you storage so that you get:
- WORK on a separate location
- UTILLOC on a separate location
- permanent libraries in separate locations (see #5)
- all this separated from your system/software disk(s)
06-26-2015 07:52 AM
I should be a bit surprised to see big differences between the two techniques. Both are just table scans/writes.
got some good general optimizing tips, even though I don't think they apply here. Table scans doesn't use WORK neither UTILLOC - the data just passes by the RAM sequentially and uses a small memory footprint. I/O is more crucial.
Having input/output on separate disks is beneficial for I/O, but if the original task was to update a table, one could suspect that the requirement is to keep the data in the same location as before the update. Different paths applies better to movement of data between different levels in a DW load.
06-26-2015 08:30 AM
What I wanted to point out is that a table update in place (or in the same library/physical location) will always have a big performance hit, and that one should consider playing "ping-pong" with the data if that is a feasible option, as it does speed up things nicely.
And when one is on the task of improving the storage setup, one might want to go the whole way and also include the WORK/UTILLOC considerations, because sorting and the utility file of SQL is always part of the SAS game.