Re: Filtering data from another table

abx · Posted 04-23-2023 05:26 AM

Hi All,

I really need help on writing this code as I am not sure on how to do it.

From my current dataset 1, I would like to filter these lists of reportable tax country (info in dataset 2). The tax country in current dataset 1 is available in 3 different columns (i.e. Tax1 to Tax3). I have multiple conditions such as customer declared as 1 tax country ranging up to 3 tax countries. I would need to identify all the 3 reportable tax country (i.e. Tax1 to Tax3) and populate the required to tax countries into new columns and also the maintaining existing info for reference.

Required output will be in data set 3. Data set 3 comprise of:

1. Existing info in data set 1 from Cus_Name to TIN3

2. Column Tax1_New to TIN3_New - For these columns, information will only appear when Tax1 to Tax3 in data set 1 match Reportable_Cntry (data set 2). This includes the TIN (Tax Number).

3. TINIssuedby column - Will derive based on tax country that has TIN (Tax Number). If TIN is not available, TINIssuedby should be blank.

Anyone able to help on this scenario? Thank you for your help.

Sample:

Data set 1: Current dataset

CUS_NAME	TAX1	TAX2	TAX3	TIN1	TIN2	TIN3
CUSTOMER 1	SG	GB	HK	S7777777F	A123456Z	C688688(0)
CUSTOMER 2	SG	HK		S7777667F	C688666(0)
CUSTOMER 3	MY	SG	JP		S7777887F
CUSTOMER 4	AL			I05101999X
CUSTOMER 5	AR	AU	AT			88-888/9999
CUSTOMER 6	MY	JP	ID
CUSTOMER 7	TH	US	GM		777777777

Data set 2: List of Reportable Tax Country

DATA REPORTABLE_CNTRY;
LENGTH REPORTABLE_CNTRY$2.;
INPUT REPORTABLE_CNTRY;
CARDS;
AL
AD
AG
AR
AU
AT
AZ
BB
BE
BR
CA
CL
CN

SG

HK

AU

RUN;

Data set 3: Output

CUS_NAME

TAX1

TAX2

TAX3

TIN1

TIN2

TIN3

TAX1_New

TAX2_New

TAX3_New

TIN1_New

TIN2_New

TIN3_New

TIN1_Issuedby

TIN2_Issuedby

TIN3_Issuedby

CUSTOMER 1

SG

GB

HK

S7777777F

A123456Z

C688688(0)

SG

HK

S7777777F

C688688(0)

SG

HK

CUSTOMER 2

SG

HK

S7777667F

C688666(0)

SG

HK

S7777667F

C688666(0)

SG

HK

CUSTOMER 3

MY

SG

JP

S7777887F

SG

S7777887F

SG

CUSTOMER 4

AL

I05101999X

AL

I05909999X

AL

CUSTOMER 5

AR

AU

AT

88-888/9999

AR

AU

AT

88-888/9999

AT

CUSTOMER 6

MY

JP

ID

CUSTOMER 7

TH

US

GM

777777777

yabwon · Posted 04-23-2023 09:24 AM

try this simple proc transpose approach:

data data1;
infile cards dlm="|" dsd;
input CUS_NAME : $ 20. (TAX1 TAX2 TAX3) (: $ 2.) (TIN1 TIN2 TIN3) (: $ 10.);
cards;
CUSTOMER 1|SG|GB|HK|S7777777F|A123456Z|C688688(0)
CUSTOMER 2|SG|HK||S7777667F|C688666(0)| 
CUSTOMER 3|MY|SG|JP||S7777887F| 
CUSTOMER 4|AL|||I05101999X|| 
CUSTOMER 5|AR|AU|AT|||88-888/9999
CUSTOMER 6|MY|JP|ID||| 
CUSTOMER 7|TH|US|GM||777777777|
;
run;
proc print;
run;


DATA REPORTABLE_CNTRY; 
LENGTH REPORTABLE_CNTRY $ 2.;
INPUT REPORTABLE_CNTRY;
CARDS;
AL
AD
AG
AR
AU
AT
AZ
BB
BE
BR
CA
CL
CN
SG
HK
AU
;
RUN;
proc print;
run; 

proc transpose data=data1 out=data1t1(rename=(_NAME_=N1 col1=TAX));
by CUS_NAME;
var TAX:;
run;
proc print;
run; 

proc transpose data=data1 out=data1t2(rename=(_NAME_=N2 col1=TIN));
by CUS_NAME;
var TIN:;
run;
proc print;
run; 

data data1t3;
  declare hash H(dataset:"REPORTABLE_CNTRY(rename=(REPORTABLE_CNTRY=TAX))");
  h.defineKey('TAX');
  h.defineDone();
  
  do until(EOF);
    

    set data1t1 end=EOF;
    set data1t2;

    if H.check()=0 then
      do;
        TAX_New=TAX;
        TIN_New=TIN;
        if TIN_New NE " " then TAX_IssuedBy=TAX;
      end;
      
    output;
    call missing(of _all_);
  end;

  stop;
run;
proc print;
run; 


proc transpose data=data1t3 out=data3_1(drop=_:) suffix=_New;
  by CUS_NAME;
  var TAX_New;
  id N1;
run;
proc transpose data=data1t3 out=data3_2(drop=_:) suffix=_New;
  by CUS_NAME;
  var TIN_New;
  id N2;
run;
proc transpose data=data1t3 out=data3_3(drop=_:) suffix=_IssuedBy;
  by CUS_NAME;
  var TAX_IssuedBy;
  id N1;
run;


data data3;
  merge data1 data3_1 data3_2 data3_3;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 04-23-2023 09:44 AM

Or try this approach with arrays, but notice that new variables names are a bit different than yours:

data data3;
  length REPORTABLE_CNTRY $ 2;
  declare hash H(dataset:"REPORTABLE_CNTRY");
  h.defineKey('REPORTABLE_CNTRY');
  h.defineDone();
    
  do until(EOF);
    set data1 end=EOF;
    
    array TAX(i) TAX:;
    array TIN(i) TIN:;

    array TAX_New[3] $ 2; /* if you have more than 3 "tax" variables then adjust */
    array TIN_New[3] $ 10;
    array TAX_IssuedBy[3] $ 2;

    do over TAX;
    if H.check(key:TAX)=0 then
      do;
        TAX_New[i]=TAX;
        TIN_New[i]=TIN;
        if TIN_New[i] NE " " then TAX_IssuedBy[i]=TAX;
      end;
    end;

    output;
    call missing(of _all_);
  end;

  stop;
  drop i REPORTABLE_CNTRY;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

AhmedAl_Attar · Posted 04-23-2023 10:26 AM

WoW,

@yabwon my solution is very similar to your 😁

/* Building on the data sets created by Bart in his first response */
DATA WANT(DROP=rc c j);
	if (0) then SET REPORTABLE_CNTRY WORK.DATA1;
	ARRAY taxs {3} $2 TAX:;
	ARRAY tins {3} $10 TIN:;
	ARRAY taxs_n {3} $2 TAX1_New TAX2_New TAX3_New ;
	ARRAY tins_n {3} $10 TIN1_New TIN2_New TIN3_New;
	ARRAY tins_I {3} $2 TIN1_IssuedBy TIN2_IssuedBy TIN3_IssuedBy;

	/* Load the Reportable country into a Hash Object */
	if (_n_=1) then
	do;
		dcl hash h(dataset:'REPORTABLE_CNTRY');
		h.defineKey('REPORTABLE_CNTRY');
		h.defineDone();
	end;

	SET WORK.DATA1;
	j=0;
	do c=1 to dim(taxs);
		call missing (tins_I[c],taxs_n[c],tins_n[c]);
		if(h.find(Key:taxs[c]) = 0) then
		do;
			j+1;
			taxs_n[j] = taxs[c];
			tins_n[j] = tins[c];
			if (STRIP(tins_n[j]) NE '') then tins_I[j] = taxs[c];
		end;
	end;
	output;
run;

yabwon · Posted 04-23-2023 11:26 AM

Wait... how was it...? "Great minds think alike" ? 😁😁😁

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

AhmedAl_Attar · Posted 04-23-2023 11:33 AM

Indeed 😉

abx · Posted 04-24-2023 05:34 AM

@yabwon @AhmedAl_Attar All code works! Thanks both. 😊

Out of curiosity, can we use similar method to find not reportable tax country? Example: Country that is not listed in dataset 2. Thanks

yabwon · Posted 04-24-2023 05:47 AM

Yes, basically by doing:

 if H.check(key:TAX) NE 0 then

The check() method returns 0 if value is in hash and non-zero otherwise.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away