BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nbonda
Obsidian | Level 7

Hi have a master dataset, to update this master dataset with transaction dataset I am using following code.

 

 creating empty dataset (one time):

  proc sql;

  create table ma.master

  (  ID                 num          informat=8. format=8. label="Iabel_invoiceid",

     Created_DT num           informat=mdyampm. format=datetime20. label="create_dt",

     COMMENTS char(100)  informat=100.  format=100. label="comments"

  );quit;

 

Now I want to update this master dataset with transaction dataset  every day, I using following code

 

data ma.master;

modify ma.master  tr.tras_data

by id;

IF _iorc_ eq 1230013 then

OUTPUT;

else do;

REPLACE;

end;

run;

 

I have 85000 records first time and every day i will have 30000 records in transaction dataset.

Its taking more than 2 hours to complete. Why its running slow Please suggest me if anything I need to modify my cody.

 

Thank you in advance

 

  

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."


http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement

Optimizing the dynamic where processing. Think of trying indexes (random IO)

---->-- ja karman --<-----

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why do you have all the additional if statements checking some automatic variable?

IF _iorc_ eq 1230013 then
OUTPUT;
else do;
REPLACE;
end;

It should just be an update by certain variables.  You should perhaps read the manual section on updating datasets:

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001304308.htm

jakarman
Barite | Level 11

"When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly."


http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001772743.htm
Where is your replace statement

Optimizing the dynamic where processing. Think of trying indexes (random IO)

---->-- ja karman --<-----
nbonda
Obsidian | Level 7

Hi jakarman,

 

How can I do  indexing on master table.

Thank you

 

jakarman
Barite | Level 11

Updating in place is very usefull when:

a/  having a big dataset and small number of updates
b/ Using it with SAS/share allowing views/updates form other places at the same moment
You can see the usage wiht Eguide / Base updating a dataset interactively.


The Creation of indexes can be done by:

 

1/ on a existing dataset using "proc datasets"
 http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0mfav25learpan1lerk7...

 You can even define contraints and a audit dataset.

2/ as a datasetoption creating the dataset

http://support.sas.com/documentation/cdl/en/ledsoptsref/68025/HTML/default/viewer.htm#n1aaucjme18e43...

Sometimes is is better to remove all options and tor recreate them after having refilled all records.
Adding records on existing dataset wiht indexes will require all updates in the index on each record.

Performance on processing is knowing/understanding what will happen inside the egine room
(*typos... corrected)       

---->-- ja karman --<-----
Tom
Super User Tom
Super User

For a table with only 85,000 observations it is not worth the headache to implement trying to modify the table in place.

Just replace the table.  Look into using the regular old UPDATE statement for applying your transactions.

 

data ma.master;
  attrib ID length=8 label='Invoice ID';
  attrib Created_DT length=8 informat=mdyampm. format=datetime20. label='Datetime Created' ;
  attrib COMMENTS length=$100 label='Invoice Comments';
  stop;
  call missing(of _all_);
run;

data ma.master;
  update ma.master tr.tras_data ;
  by id;
run;

 

 

 

nbonda
Obsidian | Level 7

Tom,

Thank you for your reply. I will test update statement also. I checked the article update and modify are doing same function.

so is there any advantage using modify statement over update statement?

 

I am using Unix platform forgot to mentionin original question:

Tom
Super User Tom
Super User

The advantages of modify include much more flexiblity and the ability to modify a data set in place.

Of course those are also the disadvantages since the flexibility makes it more complex to understand and modifying a dataset in place is also a more complex operation and potentially dangerous.  So only use MODIFY if you need to.

Note also that UPDATE is very strange tool and you need to understand how it works. The intent is to allow you to make changes just to specified variables.  So only the non missing values in the transaction records cause changes in the master records. (To change a value to missing you need to use the special missing ._ in the transaction dataset).  If your intent in the transactions is to totally replace the master record then just use MERGE instead of UPDATE.

jakarman
Barite | Level 11

Tom, The update approach modify in place is not weird. In contrary it is the basic reason for the RDBMS and SQL standard approach.
The SAS specific way handling data using SAS datasets in not commonly known and accepted. Using the PDV and automatic advancing is often bashed and asked to be replaced by SQL. Using the now buzzing words NO-SQL could help, but you are aksing fo a complete different mindset of somebody just starting to use SAS.

---->-- ja karman --<-----
Kurt_Bremser
Super User

Sort the data sets and use a data step with UPDATE. Then check the performance. With these small record counts, the operation should be finished in seconds.

What you run into is the fact that without sorting, the step scans all records of the master data set for every record of the update data set, basically causing 85K * 30K I/O operations.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5239 views
  • 1 like
  • 5 in conversation