Hi have a master dataset, to update this master dataset with transaction dataset I am using following code.
creating empty dataset (one time):
proc sql;
create table ma.master
( ID num informat=8. format=8. label="Iabel_invoiceid",
Created_DT num informat=mdyampm. format=datetime20. label="create_dt",
COMMENTS char(100) informat=100. format=100. label="comments"
);quit;
Now I want to update this master dataset with transaction dataset every day, I using following code
data ma.master;
modify ma.master tr.tras_data
by id;
IF _iorc_ eq 1230013 then
OUTPUT;
else do;
REPLACE;
end;
run;
I have 85000 records first time and every day i will have 30000 records in transaction dataset.
Its taking more than 2 hours to complete. Why its running slow Please suggest me if anything I need to modify my cody.
Thank you in advance
"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement
Optimizing the dynamic where processing. Think of trying indexes (random IO)
Why do you have all the additional if statements checking some automatic variable?
IF _iorc_ eq 1230013 then
OUTPUT;
else do;
REPLACE;
end;
It should just be an update by certain variables. You should perhaps read the manual section on updating datasets:
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001304308.htm
"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement
Optimizing the dynamic where processing. Think of trying indexes (random IO)
Hi jakarman,
How can I do indexing on master table.
Thank you
Updating in place is very usefull when:
a/ having a big dataset and small number of updates
b/ Using it with SAS/share allowing views/updates form other places at the same moment
You can see the usage wiht Eguide / Base updating a dataset interactively.
The Creation of indexes can be done by:
1/ on a existing dataset using "proc datasets"
http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0mfav25learpan1lerk7...
You can even define contraints and a audit dataset.
2/ as a datasetoption creating the dataset
http://support.sas.com/documentation/cdl/en/ledsoptsref/68025/HTML/default/viewer.htm#n1aaucjme18e43...
Sometimes is is better to remove all options and tor recreate them after having refilled all records.
Adding records on existing dataset wiht indexes will require all updates in the index on each record.
Performance on processing is knowing/understanding what will happen inside the egine room
(*typos... corrected)
For a table with only 85,000 observations it is not worth the headache to implement trying to modify the table in place.
Just replace the table. Look into using the regular old UPDATE statement for applying your transactions.
data ma.master;
attrib ID length=8 label='Invoice ID';
attrib Created_DT length=8 informat=mdyampm. format=datetime20. label='Datetime Created' ;
attrib COMMENTS length=$100 label='Invoice Comments';
stop;
call missing(of _all_);
run;
data ma.master;
update ma.master tr.tras_data ;
by id;
run;
Tom,
Thank you for your reply. I will test update statement also. I checked the article update and modify are doing same function.
so is there any advantage using modify statement over update statement?
I am using Unix platform forgot to mentionin original question:
The advantages of modify include much more flexiblity and the ability to modify a data set in place.
Of course those are also the disadvantages since the flexibility makes it more complex to understand and modifying a dataset in place is also a more complex operation and potentially dangerous. So only use MODIFY if you need to.
Note also that UPDATE is very strange tool and you need to understand how it works. The intent is to allow you to make changes just to specified variables. So only the non missing values in the transaction records cause changes in the master records. (To change a value to missing you need to use the special missing ._ in the transaction dataset). If your intent in the transactions is to totally replace the master record then just use MERGE instead of UPDATE.
Tom, The update approach modify in place is not weird. In contrary it is the basic reason for the RDBMS and SQL standard approach.
The SAS specific way handling data using SAS datasets in not commonly known and accepted. Using the PDV and automatic advancing is often bashed and asked to be replaced by SQL. Using the now buzzing words NO-SQL could help, but you are aksing fo a complete different mindset of somebody just starting to use SAS.
Sort the data sets and use a data step with UPDATE. Then check the performance. With these small record counts, the operation should be finished in seconds.
What you run into is the fact that without sorting, the step scans all records of the master data set for every record of the update data set, basically causing 85K * 30K I/O operations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.