update master table with transaction table

rohithverma · Posted 07-07-2021 03:34 PM

I had master, transaction and required output tables as mentioned below.

1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then the value in the master data should not be updated with the blank value from Transaction data.(eg:ID- 1 GENDER:female)

2ND CONDITION: For any ID, If any value is present in the master and if any value is present in the transaction data also for the same id then the value in the master data should be updated with the value from Transaction data.(eg:ID- 2 GENDER:male)

Please help me to implement the following requirement using HASH programming only for updating the data bcoz as the master dataset has millions of records and using of UPDATE statement in SAS programming is time consuming .So please help me ..

Thanks in Advance.

MASTER
ID#	Date	Gender	Code	Class
1	.	female	AB	LOW
2	.	female	CD	LOW
3	1/1/2021	M	CD

TRANSACTION
ID#	Date	Gender	Code	Class
1	7/1/2020		EF
2	7/2/2020	male
3		male

output
ID#	Date	Gender	Code	Class
1	7/1/2020	female	EF	LOW
2	7/2/2020	male	CD	LOW
3	1/1/2021	male	CD

mkeintz · Posted 07-07-2021 08:23 PM

This is tailor-made for the UPDATE statement, as in:

data want;
  update master transact;
  by id;
run;

Unlike the the MERGE statement, UPDATE will not overwrite non-missing values in MASTER with missing values in the matching record/variable in TRANSACT. Also UPDATE will collapse all transact records into a single MASTER record. And UPDATE will expect exactly one record per BY value in MASTER.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

rohithverma · Posted 07-07-2021 10:09 PM

Thanks for the reply. But update statement in sas programming is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset.

mkeintz · Posted 07-07-2021 11:39 PM

@rohithverma wrote:

Thanks for the reply. But update statement in sas programming is a time consuming technique. So this wont work in my scenario as millions of records exist in master dataset.

Assuming your data are sorted by ID, update is probably the least time-consuming way of creating a new updated master file from the old master file.

And really, millions of records is not that much these days.

How many transactions do you have? If only a few, then you might further improve performance by modifying the master data set in place, using the MODIFY statement in a data step. See the documentation on MODIFY STATEMENT. Example 3 will give you some guidance if your data are sorted by ID. I imagine (untested) that you could do something like:

data master;
  modify master transact (in=int);
  by id;
  if int then replace;
run;

Of course, like my other response, this assumes both datasets are sorted by ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 07-08-2021 02:32 AM

1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then the value in the master data should not be updated with the blank value from Transaction data.(eg:ID- 1 GENDER:female)

2ND CONDITION: For any ID, If any value is present in the master and if any value is present in the transaction data also for the same id then the value in the master data should be updated with the value from Transaction data.(eg:ID- 2 GENDER:male)

This can be written as: For any ID present in the master, overwrite the master data with any non-missing transaction data. Right?

The UPDATE statement should run in seconds for a few million records.

data MASTER TRAN;
  do ID=1 to 1e6;
    A=ifn(ranuni(1)>.8, ., ID);
    B=ifn(ranuni(1)>.8, ., ID);
    output MASTER;
    A=ifn(ranuni(1)>.8, ., ID);
    B=ifn(ranuni(1)>.8, ., ID);
    output TRAN;
  end;
run;
data MASTER;
 update MASTER TRAN;
 by ID;
run;

NOTE: There were 1000000 observations read from the data set WORK.MASTER.
NOTE: There were 1000000 observations read from the data set WORK.TRAN.
NOTE: The data set WORK.MASTERhas 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.08 seconds

High-Performance SAS Coding - Third Edition