The SAS Output Delivery System and reporting techniques

why my code is running very slow (Modify statement to update master dataset )

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

why my code is running very slow (Modify statement to update master dataset )

Hi have a master dataset, to update this master dataset with transaction dataset I am using following code.

 

 creating empty dataset (one time):

  proc sql;

  create table ma.master

  (  ID                 num          informat=8. format=8. label="Iabel_invoiceid",

     Created_DT num           informat=mdyampm. format=datetime20. label="create_dt",

     COMMENTS char(100)  informat=100.  format=100. label="comments"

  );quit;

 

Now I want to update this master dataset with transaction dataset  every day, I using following code

 

data ma.master;

modify ma.master  tr.tras_data

by id;

IF _iorc_ eq 1230013 then

OUTPUT;

else do;

REPLACE;

end;

run;

 

I have 85000 records first time and every day i will have 30000 records in transaction dataset.

Its taking more than 2 hours to complete. Why its running slow Please suggest me if anything I need to modify my cody.

 

Thank you in advance

 

  


Accepted Solutions
Solution
‎11-11-2015 03:43 PM
Valued Guide
Posts: 3,208

Re: why my code is running very slow (Modify statement to update master dataset )

"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."


http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement

Optimizing the dynamic where processing. Think of trying indexes (random IO)

---->-- ja karman --<-----

View solution in original post


All Replies
Super User
Super User
Posts: 7,720

Re: why my code is running very slow (Modify statement to update master dataset )

Why do you have all the additional if statements checking some automatic variable?

IF _iorc_ eq 1230013 then
OUTPUT;
else do;
REPLACE;
end;

It should just be an update by certain variables.  You should perhaps read the manual section on updating datasets:

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001304308.htm

Solution
‎11-11-2015 03:43 PM
Valued Guide
Posts: 3,208

Re: why my code is running very slow (Modify statement to update master dataset )

"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."


http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement

Optimizing the dynamic where processing. Think of trying indexes (random IO)

---->-- ja karman --<-----
Contributor
Posts: 40

Re: why my code is running very slow (Modify statement to update master dataset )

Hi jakarman,

 

How can I do  indexing on master table.

Thank you

 

Valued Guide
Posts: 3,208

Re: why my code is running very slow (Modify statement to update master dataset )

[ Edited ]

Updating in place is very usefull when:

a/  having a big dataset and small number of updates
b/ Using it with SAS/share allowing views/updates form other places at the same moment
You can see the usage wiht Eguide / Base updating a dataset interactively.


The Creation of indexes can be done by:

 

1/ on a existing dataset using "proc datasets"
 http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0mfav25learpan1lerk7...

 You can even define contraints and a audit dataset.

2/ as a datasetoption creating the dataset

http://support.sas.com/documentation/cdl/en/ledsoptsref/68025/HTML/default/viewer.htm#n1aaucjme18e43...

Sometimes is is better to remove all options and tor recreate them after having refilled all records.
Adding records on existing dataset wiht indexes will require all updates in the index on each record.

Performance on processing is knowing/understanding what will happen inside the egine room
(*typos... corrected)       

---->-- ja karman --<-----
Super User
Super User
Posts: 6,845

Re: why my code is running very slow (Modify statement to update master dataset )

For a table with only 85,000 observations it is not worth the headache to implement trying to modify the table in place.

Just replace the table.  Look into using the regular old UPDATE statement for applying your transactions.

 

data ma.master;
  attrib ID length=8 label='Invoice ID';
  attrib Created_DT length=8 informat=mdyampm. format=datetime20. label='Datetime Created' ;
  attrib COMMENTS length=$100 label='Invoice Comments';
  stop;
  call missing(of _all_);
run;

data ma.master;
  update ma.master tr.tras_data ;
  by id;
run;

 

 

 

Contributor
Posts: 40

Re: why my code is running very slow (Modify statement to update master dataset )

Tom,

Thank you for your reply. I will test update statement also. I checked the article update and modify are doing same function.

so is there any advantage using modify statement over update statement?

 

I am using Unix platform forgot to mentionin original question:

Super User
Super User
Posts: 6,845

Re: why my code is running very slow (Modify statement to update master dataset )

The advantages of modify include much more flexiblity and the ability to modify a data set in place.

Of course those are also the disadvantages since the flexibility makes it more complex to understand and modifying a dataset in place is also a more complex operation and potentially dangerous.  So only use MODIFY if you need to.

Note also that UPDATE is very strange tool and you need to understand how it works. The intent is to allow you to make changes just to specified variables.  So only the non missing values in the transaction records cause changes in the master records. (To change a value to missing you need to use the special missing ._ in the transaction dataset).  If your intent in the transactions is to totally replace the master record then just use MERGE instead of UPDATE.

Valued Guide
Posts: 3,208

Re: why my code is running very slow (Modify statement to update master dataset )

Tom, The update approach modify in place is not weird. In contrary it is the basic reason for the RDBMS and SQL standard approach.
The SAS specific way handling data using SAS datasets in not commonly known and accepted. Using the PDV and automatic advancing is often bashed and asked to be replaced by SQL. Using the now buzzing words NO-SQL could help, but you are aksing fo a complete different mindset of somebody just starting to use SAS.

---->-- ja karman --<-----
Super User
Posts: 7,431

Re: why my code is running very slow (Modify statement to update master dataset )

Sort the data sets and use a data step with UPDATE. Then check the performance. With these small record counts, the operation should be finished in seconds.

What you run into is the fact that without sorting, the step scans all records of the master data set for every record of the update data set, basically causing 85K * 30K I/O operations.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1936 views
  • 1 like
  • 5 in conversation