BookmarkSubscribeRSS Feed
Maicemuallem
Calcite | Level 5

when i use proc update to update 12 fields in a 5.5M data set, it takes a long time.

Any suggestions to replace proc update?

 

6 REPLIES 6
ChrisHemedinger
Community Manager

Do you mean PROC SQL with the UPDATE statement?  You might want to share some more details about your data scheme and how you're having to update records within it.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
Maicemuallem
Calcite | Level 5

yes. It is a proc sql.

my data has 475 variables and 5.5 Million observations. I am interested in updating 11 variables for only 65,000 observations

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't have anything to test this on, but basic datastep has an update statement as well:

data want;
  update <master_dataset> <update_dateset>;
  by <id_variables>;
run;

This should run far quicker than SQL - the SQL in SAS tends to be quite resource intensive.  

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202975.htm

FreelanceReinh
Jade | Level 19

I created test data:

  • dataset BIG with 475 variables (one numeric ID with length 8 and 474 character variables of length 1, populated with random letters) and 5.5 million observations (sorted by ID, file size 2.5 GB)
  • dataset TA (for transaction) with 65000 obs. and 12 variables: ID with randomly selected values from BIG and 11 randomly selected character variables from BIG, populated with '#' characters and about 2% missing values, also sorted by ID.

Then I let a data step with UPDATE statement, as suggested by @RW9, and an equivalent PROC SQL step (using a LEFT JOIN and the COALESCE function, not an UPDATE statement) run three times each to create the updated dataset WANT (or WANT1, resp.).

 

Run times (in seconds):

Data step: 5.09, 5.52, 5.64

PROC SQL: 9.03, 9.47, 9.41

 

The difference is highly significant (t test, p<.0001). 

 

For another comparison, I updated BIG directly ("in place") using PROC SQL's UPDATE statement and the WHERE clause

where id in (select id from ta);

With (hard-coded) constant transaction data ('#') this was faster (average run time 4.4 s), but with a subquery pulling the real transaction data from dataset TA for only one of the 11 variables to be updated, the run time went up to about 87 s and would probably have been much higher, had this been done for the remaining 10 character variables as well.

 

Finally, I tried the MODIFY statement of the data step to update the (recreated) dataset BIG in place:

data big;
modify big ta;
by id;
run;

This step took so long that I terminated it after about 15 minutes. According to what it had accomplished up to this time, it would have taken about 8400 s to finish.

 

So, bottom line is (as expected): Run time depends heavily on the update method used.

 

@Maicemuallem: How does your PROC SQL step look like (not in detail, just the structure)?

Maicemuallem
Calcite | Level 5

That was very helpful, thanks

LinusH
Tourmaline | Level 20

If you are updating all records of the table, you may find recreating the table will be faster.

This with a data step, or a SQL create table as select from...

Is it a SAS data set/table?

Any indexes/constraints in the equation?

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1899 views
  • 3 likes
  • 5 in conversation