Hi All,
I really need help on writing this code as I am not sure on how to do it.
From my current dataset 1, I would like to filter these lists of reportable tax country (info in dataset 2). The tax country in current dataset 1 is available in 3 different columns (i.e. Tax1 to Tax3). I have multiple conditions such as customer declared as 1 tax country ranging up to 3 tax countries. I would need to identify all the 3 reportable tax country (i.e. Tax1 to Tax3) and populate the required to tax countries into new columns and also the maintaining existing info for reference.
Required output will be in data set 3. Data set 3 comprise of:
1. Existing info in data set 1 from Cus_Name to TIN3
2. Column Tax1_New to TIN3_New - For these columns, information will only appear when Tax1 to Tax3 in data set 1 match Reportable_Cntry (data set 2). This includes the TIN (Tax Number).
3. TINIssuedby column - Will derive based on tax country that has TIN (Tax Number). If TIN is not available, TINIssuedby should be blank.
Anyone able to help on this scenario? Thank you for your help.
Sample:
Data set 1: Current dataset
CUS_NAME | TAX1 | TAX2 | TAX3 | TIN1 | TIN2 | TIN3 |
CUSTOMER 1 | SG | GB | HK | S7777777F | A123456Z | C688688(0) |
CUSTOMER 2 | SG | HK | S7777667F | C688666(0) | ||
CUSTOMER 3 | MY | SG | JP | S7777887F | ||
CUSTOMER 4 | AL | I05101999X | ||||
CUSTOMER 5 | AR | AU | AT | 88-888/9999 | ||
CUSTOMER 6 | MY | JP | ID | |||
CUSTOMER 7 | TH | US | GM | 777777777 |
Data set 2: List of Reportable Tax Country
DATA REPORTABLE_CNTRY;
LENGTH REPORTABLE_CNTRY$2.;
INPUT REPORTABLE_CNTRY;
CARDS;
AL
AD
AG
AR
AU
AT
AZ
BB
BE
BR
CA
CL
CN
SG
HK
AU
RUN;
Data set 3: Output
CUS_NAME | TAX1 | TAX2 | TAX3 | TIN1 | TIN2 | TIN3 | TAX1_New | TAX2_New | TAX3_New | TIN1_New | TIN2_New | TIN3_New | TIN1_Issuedby | TIN2_Issuedby | TIN3_Issuedby |
CUSTOMER 1 | SG | GB | HK | S7777777F | A123456Z | C688688(0) | SG | HK | S7777777F | C688688(0) | SG | HK | |||
CUSTOMER 2 | SG | HK | S7777667F | C688666(0) | SG | HK | S7777667F | C688666(0) | SG | HK | |||||
CUSTOMER 3 | MY | SG | JP | S7777887F | SG | S7777887F | SG | ||||||||
CUSTOMER 4 | AL | I05101999X | AL | I05909999X | AL | ||||||||||
CUSTOMER 5 | AR | AU | AT | 88-888/9999 | AR | AU | AT | 88-888/9999 | AT | ||||||
CUSTOMER 6 | MY | JP | ID | ||||||||||||
CUSTOMER 7 | TH | US | GM | 777777777 |
try this simple proc transpose approach:
data data1;
infile cards dlm="|" dsd;
input CUS_NAME : $ 20. (TAX1 TAX2 TAX3) (: $ 2.) (TIN1 TIN2 TIN3) (: $ 10.);
cards;
CUSTOMER 1|SG|GB|HK|S7777777F|A123456Z|C688688(0)
CUSTOMER 2|SG|HK||S7777667F|C688666(0)|
CUSTOMER 3|MY|SG|JP||S7777887F|
CUSTOMER 4|AL|||I05101999X||
CUSTOMER 5|AR|AU|AT|||88-888/9999
CUSTOMER 6|MY|JP|ID|||
CUSTOMER 7|TH|US|GM||777777777|
;
run;
proc print;
run;
DATA REPORTABLE_CNTRY;
LENGTH REPORTABLE_CNTRY $ 2.;
INPUT REPORTABLE_CNTRY;
CARDS;
AL
AD
AG
AR
AU
AT
AZ
BB
BE
BR
CA
CL
CN
SG
HK
AU
;
RUN;
proc print;
run;
proc transpose data=data1 out=data1t1(rename=(_NAME_=N1 col1=TAX));
by CUS_NAME;
var TAX:;
run;
proc print;
run;
proc transpose data=data1 out=data1t2(rename=(_NAME_=N2 col1=TIN));
by CUS_NAME;
var TIN:;
run;
proc print;
run;
data data1t3;
declare hash H(dataset:"REPORTABLE_CNTRY(rename=(REPORTABLE_CNTRY=TAX))");
h.defineKey('TAX');
h.defineDone();
do until(EOF);
set data1t1 end=EOF;
set data1t2;
if H.check()=0 then
do;
TAX_New=TAX;
TIN_New=TIN;
if TIN_New NE " " then TAX_IssuedBy=TAX;
end;
output;
call missing(of _all_);
end;
stop;
run;
proc print;
run;
proc transpose data=data1t3 out=data3_1(drop=_:) suffix=_New;
by CUS_NAME;
var TAX_New;
id N1;
run;
proc transpose data=data1t3 out=data3_2(drop=_:) suffix=_New;
by CUS_NAME;
var TIN_New;
id N2;
run;
proc transpose data=data1t3 out=data3_3(drop=_:) suffix=_IssuedBy;
by CUS_NAME;
var TAX_IssuedBy;
id N1;
run;
data data3;
merge data1 data3_1 data3_2 data3_3;
run;
proc print;
run;
Bart
Or try this approach with arrays, but notice that new variables names are a bit different than yours:
data data3;
length REPORTABLE_CNTRY $ 2;
declare hash H(dataset:"REPORTABLE_CNTRY");
h.defineKey('REPORTABLE_CNTRY');
h.defineDone();
do until(EOF);
set data1 end=EOF;
array TAX(i) TAX:;
array TIN(i) TIN:;
array TAX_New[3] $ 2; /* if you have more than 3 "tax" variables then adjust */
array TIN_New[3] $ 10;
array TAX_IssuedBy[3] $ 2;
do over TAX;
if H.check(key:TAX)=0 then
do;
TAX_New[i]=TAX;
TIN_New[i]=TIN;
if TIN_New[i] NE " " then TAX_IssuedBy[i]=TAX;
end;
end;
output;
call missing(of _all_);
end;
stop;
drop i REPORTABLE_CNTRY;
run;
proc print;
run;
Bart
WoW,
@yabwon my solution is very similar to your 😁
/* Building on the data sets created by Bart in his first response */
DATA WANT(DROP=rc c j);
if (0) then SET REPORTABLE_CNTRY WORK.DATA1;
ARRAY taxs {3} $2 TAX:;
ARRAY tins {3} $10 TIN:;
ARRAY taxs_n {3} $2 TAX1_New TAX2_New TAX3_New ;
ARRAY tins_n {3} $10 TIN1_New TIN2_New TIN3_New;
ARRAY tins_I {3} $2 TIN1_IssuedBy TIN2_IssuedBy TIN3_IssuedBy;
/* Load the Reportable country into a Hash Object */
if (_n_=1) then
do;
dcl hash h(dataset:'REPORTABLE_CNTRY');
h.defineKey('REPORTABLE_CNTRY');
h.defineDone();
end;
SET WORK.DATA1;
j=0;
do c=1 to dim(taxs);
call missing (tins_I[c],taxs_n[c],tins_n[c]);
if(h.find(Key:taxs[c]) = 0) then
do;
j+1;
taxs_n[j] = taxs[c];
tins_n[j] = tins[c];
if (STRIP(tins_n[j]) NE '') then tins_I[j] = taxs[c];
end;
end;
output;
run;
Wait... how was it...? "Great minds think alike" ? 😁😁😁
Bart
Indeed 😉
@yabwon @AhmedAl_Attar All code works! Thanks both. 😊
Out of curiosity, can we use similar method to find not reportable tax country? Example: Country that is not listed in dataset 2. Thanks
Yes, basically by doing:
if H.check(key:TAX) NE 0 then
The check() method returns 0 if value is in hash and non-zero otherwise.
Bart
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.