Solved: Re: Alternative to proc sql update

Swapnil_21 · Posted 02-16-2022 05:48 AM

Hi,

I need to update only one column in master based on transaction table. However, during proc sql update, its taking huge time.

Below is Master table

Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG
RAM3000007	1.538430769	0.6667	400001700	400001700	CONS	N
RAM3000015	4.2922	0.7368	400004000	400004000	CONS	N
RAM3000023	4.2922	0.7368	400004000	400004001	CONS	N

And Here is the Transaction Table

ORIGINAL_VALUE	MASKED_VALUE
RAM3000007	ABC345890
RAM3000015	ABC67848
RAM3000023	ABC77545

And Here is what is required

Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG
ABC345890	1.538430769	0.6667	400001700	400001700	CONS	N
ABC67848	4.2922	0.7368	400004000	400004000	CONS	N
ABC77545	4.2922	0.7368	400004000	400004001	CONS	N

Below code takes huge time

proc sql;
update master as u
set account_id=(select MASKED_VALUE from new as n
where u.ACCOUNT_ID=n.ORIGINAL_VALUE)
where u.ACCOUNT_ID in (select ORIGINAL_VALUE from new);
quit;

But this is taking huge time. Any help on alternatives is really appreciated

Ksharp · Posted 02-16-2022 07:15 AM

Try Hash Table, if your memory of PC is big enough .

data master;
infile cards expandtabs truncover;
input (Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG) (: $40.);
cards;
RAM3000007	1.538430769	0.6667	400001700	400001700	CONS	N
RAM3000015	4.2922	0.7368	400004000	400004000	CONS	N
RAM3000023	4.2922	0.7368	400004000	400004001	CONS	N
;

 
data Transaction ; 
infile cards expandtabs truncover;
input (ORIGINAL_VALUE	MASKED_VALUE) (: $40.);
cards;
RAM3000007	ABC345890
RAM3000015	ABC67848
RAM3000023	ABC77545
;

data want;
 if _n_=1 then do;
  if 0 then set Transaction;
  declare hash h(dataset:'Transaction',hashexp:20);
  h.definekey('ORIGINAL_VALUE');
  h.definedata('MASKED_VALUE');
  h.definedone();
 end;
set master;
call missing(MASKED_VALUE);
rc=h.find(key:Account_ID);
drop rc ORIGINAL_VALUE Account_ID;
rename MASKED_VALUE=new_Account_ID;
run;

View solution in original post

PeterClemmensen · Posted 02-16-2022 05:54 AM

First question. Is either of your tables located on a data base? Or are they both SAS data sets?

The DATA to DATA Step Macro
Blog: SASnrd

Swapnil_21 · Posted 02-16-2022 05:56 AM

Both are sas datasets

PeterClemmensen · Posted 02-16-2022 05:57 AM

How large are they?

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 02-16-2022 06:08 AM

Also, are both data sets sorted ?

The DATA to DATA Step Macro
Blog: SASnrd

Swapnil_21 · Posted 02-16-2022 06:30 AM

Both are sorted.

Master table size 10003476
Transaction table size 13567892

PeterClemmensen · Posted 02-16-2022 07:13 AM

Then try this

data master;
input Account_ID :$10. APPL_LOAN_PCT_GROSS_INC APPL_LVR APPL_NO_SPLIT APPLICATION_ID BASEL_RETAIL_CORP_CODE $ BROKER_FLG $;
datalines;
RAM3000007 1.538430769 0.6667 400001700 400001700 CONS N
RAM3000015 4.2922      0.7368 400004000 400004000 CONS N
RAM3000023 4.2922      0.7368 400004000 400004001 CONS N
;

data transaction;
input ORIGINAL_VALUE :$10. MASKED_VALUE :$10.;
datalines;
RAM3000007 ABC345890
RAM3000015 ABC67848
RAM3000023 ABC77545
;

data master;
   modify master transaction(rename = ORIGINAL_VALUE = Account_ID);
   by Account_ID;
   Account_ID = MASKED_VALUE;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 02-16-2022 07:45 AM

It's a simple MERGE then:

data
  master_new
  unmasked
;
merge
  master (in=m)
  transaction (in=t)
;
by account_id;
if m;
if t
then account_id = masked_value;
else output unmasked;
output master_new;
drop masked_value;
run;

I have added an additional output dataset, since this looks like some kind of anonymization where no unmasked values will be wanted,

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 02-16-2022 07:15 AM

Try Hash Table, if your memory of PC is big enough .

data master;
infile cards expandtabs truncover;
input (Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG) (: $40.);
cards;
RAM3000007	1.538430769	0.6667	400001700	400001700	CONS	N
RAM3000015	4.2922	0.7368	400004000	400004000	CONS	N
RAM3000023	4.2922	0.7368	400004000	400004001	CONS	N
;

 
data Transaction ; 
infile cards expandtabs truncover;
input (ORIGINAL_VALUE	MASKED_VALUE) (: $40.);
cards;
RAM3000007	ABC345890
RAM3000015	ABC67848
RAM3000023	ABC77545
;

data want;
 if _n_=1 then do;
  if 0 then set Transaction;
  declare hash h(dataset:'Transaction',hashexp:20);
  h.definekey('ORIGINAL_VALUE');
  h.definedata('MASKED_VALUE');
  h.definedone();
 end;
set master;
call missing(MASKED_VALUE);
rc=h.find(key:Account_ID);
drop rc ORIGINAL_VALUE Account_ID;
rename MASKED_VALUE=new_Account_ID;
run;

Swapnil_21 · Posted 02-16-2022 08:47 PM

@Ksharp This Hash solution is working fine. Need one more help on the same, In my master, I have few more columns which I need to mask with Masked_value from transaction, then in that case how can i generalize this solution. See example as beow

Master

Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG	Customer_id
RAM3000007	1.538430769	0.6667	400001700	400001700	CONS	N	545966
RAM3000015	4.2922	0.7368	400004000	400004000	CONS	N	345182
RAM3000023	4.2922	0.7368	400004000	400004001	CONS	N	773280

Transaction

ORIGINAL_VALUE	MASKED_VALUE
RAM3000007	ABC345890
RAM3000015	ABC67848
RAM3000023	ABC77545
545966	884597
345182	489012
773280	771811

Output Required :

Account_ID	APPL_LOAN_PCT_GROSS_INC	APPL_LVR	APPL_NO_SPLIT	APPLICATION_ID	BASEL_RETAIL_CORP_CODE	BROKER_FLG	Customer_id
ABC345890	1.538430769	0.6667	400001700	400001700	CONS	N	884597
ABC67848	4.2922	0.7368	400004000	400004000	CONS	N	489012
ABC77545	4.2922	0.7368	400004000	400004001	CONS	N	771811

Thank you so much in advance

Kurt_Bremser · Posted 02-17-2022 03:54 AM

You can use multiple lookups from the hash:

rc1 = h.find(key:Account_ID);
if rc1 = 0 then account_id = masked_value;
rc2 = h.find(key:customer_id);
if rc2 = 0 then customer_id = masked_value;
if rc1 or rc2 /* if either find failed */
then output unmasked;
else output want;
drop rc1 rc2;
run;

Note that I removed the RENAME from @Ksharp 's code.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 02-17-2022 06:53 AM

As Kurt said.

rc1 = h.find(key:Account_ID);
if rc1 = 0 then account_id = masked_value;
rc2 = h.find(key:customer_id);
if rc2 = 0 then customer_id = masked_value;
run;

Register Today!

SAS Training: Just a Click Away