I had master, transaction and required output tables as mentioned below.
1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then the value in the master data should not be updated with the blank value from Transaction data.(eg:ID- 1 GENDER:female)
2ND CONDITION: For any ID, If any value is present in the master and if any value is present in the transaction data also for the same id then the value in the master data should be updated with the value from Transaction data.(eg:ID- 2 GENDER:male)
Please help me to implement the following requirement using HASH programming only for updating the data bcoz as the master dataset has millions of records and using of UPDATE statement in SAS programming is time consuming .So please help me ..
Thanks in Advance.
MASTER | ||||
ID# | Date | Gender | Code | Class |
1 | . | female | AB | LOW |
2 | . | female | CD | LOW |
3 | 1/1/2021 | M | CD |
TRANSACTION | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | EF | ||
2 | 7/2/2020 | male | ||
3 | male |
output | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | female | EF | LOW |
2 | 7/2/2020 | male | CD | LOW |
3 | 1/1/2021 | male | CD |
This is tailor-made for the UPDATE statement, as in:
data want;
update master transact;
by id;
run;
Unlike the the MERGE statement, UPDATE will not overwrite non-missing values in MASTER with missing values in the matching record/variable in TRANSACT. Also UPDATE will collapse all transact records into a single MASTER record. And UPDATE will expect exactly one record per BY value in MASTER.
Thanks for the reply. But update statement in sas programming is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset.
@rohithverma wrote:
Thanks for the reply. But update statement in sas programming is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset.
Assuming your data are sorted by ID, update is probably the least time-consuming way of creating a new updated master file from the old master file.
And really, millions of records is not that much these days.
How many transactions do you have? If only a few, then you might further improve performance by modifying the master data set in place, using the MODIFY statement in a data step. See the documentation on MODIFY STATEMENT. Example 3 will give you some guidance if your data are sorted by ID. I imagine (untested) that you could do something like:
data master;
modify master transact (in=int);
by id;
if int then replace;
run;
Of course, like my other response, this assumes both datasets are sorted by ID.
1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then the value in the master data should not be updated with the blank value from Transaction data.(eg:ID- 1 GENDER:female)
2ND CONDITION: For any ID, If any value is present in the master and if any value is present in the transaction data also for the same id then the value in the master data should be updated with the value from Transaction data.(eg:ID- 2 GENDER:male)
This can be written as: For any ID present in the master, overwrite the master data with any non-missing transaction data. Right?
The UPDATE statement should run in seconds for a few million records.
data MASTER TRAN;
do ID=1 to 1e6;
A=ifn(ranuni(1)>.8, ., ID);
B=ifn(ranuni(1)>.8, ., ID);
output MASTER;
A=ifn(ranuni(1)>.8, ., ID);
B=ifn(ranuni(1)>.8, ., ID);
output TRAN;
end;
run;
data MASTER;
update MASTER TRAN;
by ID;
run;
NOTE: There were 1000000 observations read from the data set WORK.MASTER.
NOTE: There were 1000000 observations read from the data set WORK.TRAN.
NOTE: The data set WORK.MASTERhas 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.08 seconds
> The UPDATE statement should run in seconds for a few million records.
Unless your data is not in SAS of course, but then surely you'd have told us that.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.