BookmarkSubscribeRSS Feed
kds
Calcite | Level 5 kds
Calcite | Level 5

Hello SAS Experts,

I am running a simple update query as shown below after removing all the indexes from the table. There is no where clause required as all values under this column need update.

proc sql;

update tablename set columnname='xxxxxxxxxxxxxxxxxxx';

quit;

For a table with 45 million records and 153 columns, the CPU time taken by query is 1 hour 51 minutes.(real time ranges from 7 to 10 hours).

Will below Data step be a better choice than above query in reducing the CPU time?

Data libnam.table1;      set libnam.table1;   columnname='xxxxxxxxxxxxxxxxxxx';        run;

If not , are there any techniques to reduce the CPU time for the above kind of update?

Thanks...

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Simple answer: Yes.  As you may gather from other posts, SQL in SAS is not well optimised for large dataset processing.  Use the datastep method, preferably combine it with any other processing you have as well to reduce time/resource.  Maybe also consider if it is worthwhile adding a default value to each row of the data, you are really just increasing the size, without adding anything.  I.e. is it really necessary.

Kurt_Bremser
Super User

5 points that will help you:

#1 through #4 : Tune your storage (SSD, multiple disks in RAID to spread I/O load, multiple host buses and adapters)

#5: data lib2.want; set lib1.have; columnname = 'xxxxxxxxxxxxx'; run; * where lib1 and lib2 reside on physically separated paths (different disks or RAID groups). This prevents concurrent read and write on the same disk(s). proc sql; create table lib2.want as select a.*, 'xxxxxxxxxxx' as columnname from lib1.have a; quit; is equivalent.

For further tuning, set up you storage so that you get:

- WORK on a separate location

- UTILLOC on a separate location

- permanent libraries in separate locations (see #5)

- all this separated from your system/software disk(s)

LinusH
Tourmaline | Level 20

I should be a bit surprised to see big differences between the two techniques. Both are just table scans/writes.

got some good general optimizing tips, even though I don't think they apply here. Table scans doesn't use WORK neither UTILLOC - the data just passes by the RAM sequentially and uses a small memory footprint. I/O is more crucial.

Having input/output on separate disks is beneficial for I/O, but if the original task was to update a table, one could suspect that the requirement is to keep the data in the same location as before the update. Different paths applies better to movement of data between different levels in a DW load.

Data never sleeps
Kurt_Bremser
Super User

What I wanted to point out is that a table update in place (or in the same library/physical location) will always have a big performance hit, and that one should consider playing "ping-pong" with the data if that is a feasible option, as it does speed up things nicely.

And when one is on the task of improving the storage setup, one might want to go the whole way and also include the WORK/UTILLOC considerations, because sorting and the utility file of SQL is always part of the SAS game.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 521 views
  • 6 likes
  • 4 in conversation