BookmarkSubscribeRSS Feed
rohithverma
Obsidian | Level 7

I had master, transaction and required output tables as mentioned below.

1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then  the value in the master data should not be updated with the  blank value from Transaction data.(eg:ID- 1 GENDER:female)

2ND CONDITION: For any ID, If any value   is present in the master and if any  value  is present in the transaction data also for the same id then  the value in the master data should  be updated with the  value  from Transaction data.(eg:ID- 2 GENDER:male)

 

Please help me to implement the following requirement using  HASH programming only for updating the data bcoz as the master dataset has  millions of records and  using of UPDATE  statement in SAS programming is  time consuming .So please help me ..

 

Thanks in Advance.

 

MASTER        
ID# Date Gender Code Class
1 . female AB LOW
2 . female CD LOW
3 1/1/2021 M CD  

 

TRANSACTION        
ID# Date Gender Code Class
1 7/1/2020   EF  
2 7/2/2020 male    
3   male    

 

output        
ID# Date Gender Code Class
1 7/1/2020 female EF LOW
2 7/2/2020 male CD LOW
3 1/1/2021 male CD  
5 REPLIES 5
mkeintz
PROC Star

This is tailor-made for the UPDATE statement, as in:

 

data want;
  update master transact;
  by id;
run;

Unlike the the MERGE statement, UPDATE will not overwrite non-missing values in MASTER with missing values in the matching record/variable in TRANSACT.  Also UPDATE will collapse all transact records into a single MASTER record.  And UPDATE will expect exactly one record per BY value in MASTER.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rohithverma
Obsidian | Level 7

Thanks for the reply. But update statement in sas programming  is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset. 

mkeintz
PROC Star

@rohithverma wrote:

Thanks for the reply. But update statement in sas programming  is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset. 


 

Assuming your data are sorted by ID, update is probably the least time-consuming way of creating a new updated master file from the old master file.

 

And really, millions of records is not that much these days.

 

How many transactions do you have?  If only a few, then you might further improve performance by modifying the master data set in place, using the MODIFY statement in a data step.  See the documentation on MODIFY STATEMENT.   Example 3 will give you some guidance if your data are sorted by ID.   I imagine (untested) that you could do something like:

 

data master;
  modify master transact (in=int);
  by id;
  if int then replace;
run;

Of course, like my other response, this assumes both datasets are sorted by ID.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

 

1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then  the value in the master data should not be updated with the  blank value from Transaction data.(eg:ID- 1 GENDER:female)

2ND CONDITION: For any ID, If any value   is present in the master and if any  value  is present in the transaction data also for the same id then  the value in the master data should  be updated with the  value  from Transaction data.(eg:ID- 2 GENDER:male)

This can be written as: For any ID present in the master, overwrite the master data with any non-missing transaction data. Right?

 

The UPDATE statement should run in seconds for a few million records.

data MASTER TRAN;
  do ID=1 to 1e6;
    A=ifn(ranuni(1)>.8, ., ID);
    B=ifn(ranuni(1)>.8, ., ID);
    output MASTER;
    A=ifn(ranuni(1)>.8, ., ID);
    B=ifn(ranuni(1)>.8, ., ID);
    output TRAN;
  end;
run;
data MASTER;
 update MASTER TRAN;
 by ID;
run;

NOTE: There were 1000000 observations read from the data set WORK.MASTER.
NOTE: There were 1000000 observations read from the data set WORK.TRAN.
NOTE: The data set WORK.MASTERhas 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.08 seconds

 

 

 

 

ChrisNZ
Tourmaline | Level 20

> The UPDATE statement should run in seconds for a few million records.

Unless your data is not in SAS of course, but then surely you'd have told us that.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 768 views
  • 0 likes
  • 3 in conversation