when i use proc update to update 12 fields in a 5.5M data set, it takes a long time.
Any suggestions to replace proc update?
Do you mean PROC SQL with the UPDATE statement? You might want to share some more details about your data scheme and how you're having to update records within it.
yes. It is a proc sql.
my data has 475 variables and 5.5 Million observations. I am interested in updating 11 variables for only 65,000 observations
I don't have anything to test this on, but basic datastep has an update statement as well:
data want; update <master_dataset> <update_dateset>; by <id_variables>; run;
This should run far quicker than SQL - the SQL in SAS tends to be quite resource intensive.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202975.htm
I created test data:
Then I let a data step with UPDATE statement, as suggested by @RW9, and an equivalent PROC SQL step (using a LEFT JOIN and the COALESCE function, not an UPDATE statement) run three times each to create the updated dataset WANT (or WANT1, resp.).
Run times (in seconds):
Data step: 5.09, 5.52, 5.64
PROC SQL: 9.03, 9.47, 9.41
The difference is highly significant (t test, p<.0001).
For another comparison, I updated BIG directly ("in place") using PROC SQL's UPDATE statement and the WHERE clause
where id in (select id from ta);With (hard-coded) constant transaction data ('#') this was faster (average run time 4.4 s), but with a subquery pulling the real transaction data from dataset TA for only one of the 11 variables to be updated, the run time went up to about 87 s and would probably have been much higher, had this been done for the remaining 10 character variables as well.
Finally, I tried the MODIFY statement of the data step to update the (recreated) dataset BIG in place:
data big;
modify big ta;
by id;
run;This step took so long that I terminated it after about 15 minutes. According to what it had accomplished up to this time, it would have taken about 8400 s to finish.
So, bottom line is (as expected): Run time depends heavily on the update method used.
@Maicemuallem: How does your PROC SQL step look like (not in detail, just the structure)?
That was very helpful, thanks
If you are updating all records of the table, you may find recreating the table will be faster.
This with a data step, or a SQL create table as select from...
Is it a SAS data set/table?
Any indexes/constraints in the equation?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
