DATA Step, Macro, Functions and more

Left outer join in hash table with duplicates

Accepted Solution Solved
Reply
Super Contributor
Posts: 271
Accepted Solution

Left outer join in hash table with duplicates

I want to peform left outer join in hash with following code (so as to reproduce the result of left join in Proc Sql).  

 

data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;

data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if 0 then set A;
if _n_=1 then do;
dcl hash h(dataset:'A',multidata:'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
set B (keep=key d4 d5 d6);
rc=h.find();
do while(rc=0);
output;
rc=h.find_next();
end;
run;


Accepted Solutions
Solution
‎01-14-2016 02:55 PM
Super Contributor
Posts: 490

Re: Left outer join in hash table with duplicates

[ Edited ]
Posted in reply to SAS_inquisitive

Based on that A is your lookup table 


data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
 
do until(eof);
 	set b end=eof;
 	call missing(d1, d2, d3); 
 	rc=hh_pat.find();
 	output;
	if (rc = 0) then do;
		hh_pat.has_next(result: r);
			do while(r ne 0);
				hh_pat.find_next();
 				output;
				hh_pat.has_next(result: r);
			end;
	end;
end;
stop;
run;

That should be identical to

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;

 

View solution in original post


All Replies
Solution
‎01-14-2016 02:55 PM
Super Contributor
Posts: 490

Re: Left outer join in hash table with duplicates

[ Edited ]
Posted in reply to SAS_inquisitive

Based on that A is your lookup table 


data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
 
do until(eof);
 	set b end=eof;
 	call missing(d1, d2, d3); 
 	rc=hh_pat.find();
 	output;
	if (rc = 0) then do;
		hh_pat.has_next(result: r);
			do while(r ne 0);
				hh_pat.find_next();
 				output;
				hh_pat.has_next(result: r);
			end;
	end;
end;
stop;
run;

That should be identical to

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;

 

Super Contributor
Posts: 271

Re: Left outer join in hash table with duplicates

[ Edited ]
Posted in reply to mohamed_zaki

@mohamed_zaki.  Thanks.  I also want this way too.

 

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from a left join b
on a.key=b.key;
quit;

 

 

Super Contributor
Posts: 490

Re: Left outer join in hash table with duplicates

[ Edited ]
Posted in reply to SAS_inquisitive

As your lookup table B is not having duplicate, then this is easier

data test(drop=rc);
 if 0 then set b;
 declare hash hh_pat(dataset:"b");
 rc=hh_pat.defineKey("key");
 rc=hh_pat.defineData("d4", "d5", "d6","other");
 rc=hh_pat.defineDone();
 do until(eof);
 set a end=eof;
 call missing(d4, d5, d6, other);
 rc=hh_pat.find();
 output;
 end;
 stop;
run;

or you can modify the previous post code if you will have duplicate.

Super Contributor
Posts: 271

Re: Left outer join in hash table with duplicates

Posted in reply to mohamed_zaki

@ mohamed_zaki, I guess we can't do this while keeping data set  "a" in hash table, right?

Super Contributor
Posts: 490

Re: Left outer join in hash table with duplicates

[ Edited ]
Posted in reply to SAS_inquisitive

Yes, as A contains duplicate records based on your key. So you will end up with uncomplete left joined data set. Only if you do specify the option multidata="Y". And by that the hash object keeps the multiple values in a list that is associated with the key. Which can be traversed and manipulated by using several methods such as HAS_NEXT or FIND_NEXT as in the first post.

 

Super User
Posts: 10,035

Re: Left outer join in hash table with duplicates

Posted in reply to SAS_inquisitive

Your code looks good. Except:

 

data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;
data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if _n_=1 then do;
if 0 then set A;
dcl hash h(dataset:'A',multidata:'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
call missing(of _all_);
set B (keep=key d4 d5 d6);
rc=h.find();
if rc ne 0 then output;
do while(rc=0);
output;
rc=h.find_next();
end;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 480 views
  • 2 likes
  • 3 in conversation