01-21-2016 09:12 AM
Do you mean PROC SQL with the UPDATE statement? You might want to share some more details about your data scheme and how you're having to update records within it.
01-21-2016 09:23 AM
yes. It is a proc sql.
my data has 475 variables and 5.5 Million observations. I am interested in updating 11 variables for only 65,000 observations
01-21-2016 10:36 AM
I don't have anything to test this on, but basic datastep has an update statement as well:
data want; update <master_dataset> <update_dateset>; by <id_variables>; run;
This should run far quicker than SQL - the SQL in SAS tends to be quite resource intensive.
01-21-2016 01:48 PM - edited 01-21-2016 02:21 PM
I created test data:
Then I let a data step with UPDATE statement, as suggested by @RW9, and an equivalent PROC SQL step (using a LEFT JOIN and the COALESCE function, not an UPDATE statement) run three times each to create the updated dataset WANT (or WANT1, resp.).
Run times (in seconds):
Data step: 5.09, 5.52, 5.64
PROC SQL: 9.03, 9.47, 9.41
The difference is highly significant (t test, p<.0001).
For another comparison, I updated BIG directly ("in place") using PROC SQL's UPDATE statement and the WHERE clause
where id in (select id from ta);
With (hard-coded) constant transaction data ('#') this was faster (average run time 4.4 s), but with a subquery pulling the real transaction data from dataset TA for only one of the 11 variables to be updated, the run time went up to about 87 s and would probably have been much higher, had this been done for the remaining 10 character variables as well.
Finally, I tried the MODIFY statement of the data step to update the (recreated) dataset BIG in place:
data big; modify big ta; by id; run;
This step took so long that I terminated it after about 15 minutes. According to what it had accomplished up to this time, it would have taken about 8400 s to finish.
So, bottom line is (as expected): Run time depends heavily on the update method used.
@Maicemuallem: How does your PROC SQL step look like (not in detail, just the structure)?
01-21-2016 09:17 AM
If you are updating all records of the table, you may find recreating the table will be faster.
This with a data step, or a SQL create table as select from...
Is it a SAS data set/table?
Any indexes/constraints in the equation?