Hi,
I need to update only one column in master based on transaction table. However, during proc sql update, its taking huge time.
Below is Master table
Account_ID | APPL_LOAN_PCT_GROSS_INC | APPL_LVR | APPL_NO_SPLIT | APPLICATION_ID | BASEL_RETAIL_CORP_CODE | BROKER_FLG |
RAM3000007 | 1.538430769 | 0.6667 | 400001700 | 400001700 | CONS | N |
RAM3000015 | 4.2922 | 0.7368 | 400004000 | 400004000 | CONS | N |
RAM3000023 | 4.2922 | 0.7368 | 400004000 | 400004001 | CONS | N |
And Here is the Transaction Table
ORIGINAL_VALUE | MASKED_VALUE |
RAM3000007 | ABC345890 |
RAM3000015 | ABC67848 |
RAM3000023 | ABC77545
|
And Here is what is required
Account_ID | APPL_LOAN_PCT_GROSS_INC | APPL_LVR | APPL_NO_SPLIT | APPLICATION_ID | BASEL_RETAIL_CORP_CODE | BROKER_FLG |
ABC345890 | 1.538430769 | 0.6667 | 400001700 | 400001700 | CONS | N |
ABC67848 | 4.2922 | 0.7368 | 400004000 | 400004000 | CONS | N |
ABC77545 | 4.2922 | 0.7368 | 400004000 | 400004001 | CONS | N |
Below code takes huge time
proc sql;
update master as u
set account_id=(select MASKED_VALUE from new as n
where u.ACCOUNT_ID=n.ORIGINAL_VALUE)
where u.ACCOUNT_ID in (select ORIGINAL_VALUE from new);
quit;
But this is taking huge time. Any help on alternatives is really appreciated
Try Hash Table, if your memory of PC is big enough .
data master;
infile cards expandtabs truncover;
input (Account_ID APPL_LOAN_PCT_GROSS_INC APPL_LVR APPL_NO_SPLIT APPLICATION_ID BASEL_RETAIL_CORP_CODE BROKER_FLG) (: $40.);
cards;
RAM3000007 1.538430769 0.6667 400001700 400001700 CONS N
RAM3000015 4.2922 0.7368 400004000 400004000 CONS N
RAM3000023 4.2922 0.7368 400004000 400004001 CONS N
;
data Transaction ;
infile cards expandtabs truncover;
input (ORIGINAL_VALUE MASKED_VALUE) (: $40.);
cards;
RAM3000007 ABC345890
RAM3000015 ABC67848
RAM3000023 ABC77545
;
data want;
if _n_=1 then do;
if 0 then set Transaction;
declare hash h(dataset:'Transaction',hashexp:20);
h.definekey('ORIGINAL_VALUE');
h.definedata('MASKED_VALUE');
h.definedone();
end;
set master;
call missing(MASKED_VALUE);
rc=h.find(key:Account_ID);
drop rc ORIGINAL_VALUE Account_ID;
rename MASKED_VALUE=new_Account_ID;
run;
First question. Is either of your tables located on a data base? Or are they both SAS data sets?
How large are they?
Also, are both data sets sorted ?
Then try this
data master;
input Account_ID :$10. APPL_LOAN_PCT_GROSS_INC APPL_LVR APPL_NO_SPLIT APPLICATION_ID BASEL_RETAIL_CORP_CODE $ BROKER_FLG $;
datalines;
RAM3000007 1.538430769 0.6667 400001700 400001700 CONS N
RAM3000015 4.2922 0.7368 400004000 400004000 CONS N
RAM3000023 4.2922 0.7368 400004000 400004001 CONS N
;
data transaction;
input ORIGINAL_VALUE :$10. MASKED_VALUE :$10.;
datalines;
RAM3000007 ABC345890
RAM3000015 ABC67848
RAM3000023 ABC77545
;
data master;
modify master transaction(rename = ORIGINAL_VALUE = Account_ID);
by Account_ID;
Account_ID = MASKED_VALUE;
run;
It's a simple MERGE then:
data
master_new
unmasked
;
merge
master (in=m)
transaction (in=t)
;
by account_id;
if m;
if t
then account_id = masked_value;
else output unmasked;
output master_new;
drop masked_value;
run;
I have added an additional output dataset, since this looks like some kind of anonymization where no unmasked values will be wanted,
Try Hash Table, if your memory of PC is big enough .
data master;
infile cards expandtabs truncover;
input (Account_ID APPL_LOAN_PCT_GROSS_INC APPL_LVR APPL_NO_SPLIT APPLICATION_ID BASEL_RETAIL_CORP_CODE BROKER_FLG) (: $40.);
cards;
RAM3000007 1.538430769 0.6667 400001700 400001700 CONS N
RAM3000015 4.2922 0.7368 400004000 400004000 CONS N
RAM3000023 4.2922 0.7368 400004000 400004001 CONS N
;
data Transaction ;
infile cards expandtabs truncover;
input (ORIGINAL_VALUE MASKED_VALUE) (: $40.);
cards;
RAM3000007 ABC345890
RAM3000015 ABC67848
RAM3000023 ABC77545
;
data want;
if _n_=1 then do;
if 0 then set Transaction;
declare hash h(dataset:'Transaction',hashexp:20);
h.definekey('ORIGINAL_VALUE');
h.definedata('MASKED_VALUE');
h.definedone();
end;
set master;
call missing(MASKED_VALUE);
rc=h.find(key:Account_ID);
drop rc ORIGINAL_VALUE Account_ID;
rename MASKED_VALUE=new_Account_ID;
run;
@Ksharp This Hash solution is working fine. Need one more help on the same, In my master, I have few more columns which I need to mask with Masked_value from transaction, then in that case how can i generalize this solution. See example as beow
Master
Account_ID | APPL_LOAN_PCT_GROSS_INC | APPL_LVR | APPL_NO_SPLIT | APPLICATION_ID | BASEL_RETAIL_CORP_CODE | BROKER_FLG | Customer_id |
RAM3000007 | 1.538430769 | 0.6667 | 400001700 | 400001700 | CONS | N | 545966 |
RAM3000015 | 4.2922 | 0.7368 | 400004000 | 400004000 | CONS | N | 345182 |
RAM3000023 | 4.2922 | 0.7368 | 400004000 | 400004001 | CONS | N | 773280 |
Transaction
ORIGINAL_VALUE | MASKED_VALUE |
RAM3000007 | ABC345890 |
RAM3000015 | ABC67848 |
RAM3000023 | ABC77545 |
545966 | 884597 |
345182 | 489012 |
773280 | 771811 |
Output Required :
Account_ID | APPL_LOAN_PCT_GROSS_INC | APPL_LVR | APPL_NO_SPLIT | APPLICATION_ID | BASEL_RETAIL_CORP_CODE | BROKER_FLG | Customer_id |
ABC345890 | 1.538430769 | 0.6667 | 400001700 | 400001700 | CONS | N | 884597 |
ABC67848 | 4.2922 | 0.7368 | 400004000 | 400004000 | CONS | N | 489012 |
ABC77545 | 4.2922 | 0.7368 | 400004000 | 400004001 | CONS | N |
771811
|
Thank you so much in advance
You can use multiple lookups from the hash:
rc1 = h.find(key:Account_ID);
if rc1 = 0 then account_id = masked_value;
rc2 = h.find(key:customer_id);
if rc2 = 0 then customer_id = masked_value;
if rc1 or rc2 /* if either find failed */
then output unmasked;
else output want;
drop rc1 rc2;
run;
Note that I removed the RENAME from @Ksharp 's code.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.