DATA Step, Macro, Functions and more

Matching column in two dataset

Reply
Contributor
Posts: 30

Matching column in two dataset

Hi, I have following datasets and variables and need to match variables from one dataset to another dataset.

 

Requirement :- Need to match var1 to var5 from test1 dataset with var1 to var5 in test2 dataset. If any value from var1 to var5 is matching with any variable from var1 to var5 in test2 then have this id in output. I need this to be done in sql as this is just sample of existing code which is written in sql. 

 

I need this code to be modified after on t1.id = t2.id. Is there any simple way to get the result instaed of take one variable and try to match with var1 to var5 from second dataset. 

 

Missing and 0 should not match. 

 

proc sql;
select t1.id, "YES" as match, t1.sales_id from test1 t1 join test2 t2
on t1.id = t2.id ;
run;

 

 

 

data test1;
input id var1 $ var2 $ var3 $ var4 $ var5 $ sales_id;
datalines ;
1 A B C D E 101
2 F G H 0 0 102  
3 I J 0 0 0 103 
4 N M Q 0 0 104 
;
run;

data test2;
input id var1 $ var2 $ var3 $ var4 $ var5 $;
datalines ;
1 A 0 0 0 0
2 0 0 H G 0   
3 L M 0 0 0  
4 P A N 0 0  
;
run;

data want;
input id match $ sales_id;
datalines ;
1 YES 101
2 YES 102
4 YES 104
;
run;

proc sql;
select t1.id, "YES" as match, t1.sales_id from test1 t1 join test2 t2
on t1.id = t2.id ;
run;

 

Thanks

Super Contributor
Posts: 336

Re: Matching column in two dataset

I would "transpose" the 2 datasets and merge afterwards:

 

data test1;
input id var1 $ var2 $ var3 $ var4 $ var5 $ sales_id;
datalines ;
1 A B C D E 101
2 F G H 0 0 102  
3 I J 0 0 0 103 
4 N M Q 0 0 104 
;
run;

data test2;
input id var1 $ var2 $ var3 $ var4 $ var5 $;
datalines ;
1 A 0 0 0 0
2 0 0 H G 0   
3 L M 0 0 0  
4 P A N 0 0  
;
run;

Data test1;
  Set test1;
  Array V Var1-Var5;
  Do over V;
    Var=V;
	Output;
  End;
  Drop Var1-Var5;
Run;

Data test2;
  Set test2;
  Array V Var1-Var5;
  Do over V;
    Var=V;
	Output;
  End;
  Drop Var1-Var5;
Run;

Proc SQL;
  Create Table Want As Select Distinct t1.id, t1.sales_id
    From test1 t1
	Left Join test2 t2
	On t1.id=t2.id
	Where t1.Var ne '0' AND t2.Var ne '0' AND t1.Var eq t2.Var;
Quit;
Respected Advisor
Posts: 4,646

Re: Matching column in two dataset

That's the problem with wide data. If var1 can be matched with var2, why aren't they the same variable? SQL queries are much simpler with normalised data structures.

PG
Ask a Question
Discussion stats
  • 2 replies
  • 198 views
  • 2 likes
  • 3 in conversation