Solved: Re: Left outer join in hash table with duplicates

SAS_inquisitive · Posted 01-13-2016 04:58 PM

I want to peform left outer join in hash with following code (so as to reproduce the result of left join in Proc Sql).

data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;

data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if 0 then set A;
if _n_=1 then do;
dcl hash h(dataset:'A',multidata&colon;'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
set B (keep=key d4 d5 d6);
rc=h.find();
do while(rc=0);
output;
rc=h.find_next();
end;
run;

mohamed_zaki · Posted 01-13-2016 08:21 PM

Based on that A is your lookup table


data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
 
do until(eof);
 	set b end=eof;
 	call missing(d1, d2, d3); 
 	rc=hh_pat.find();
 	output;
	if (rc = 0) then do;
		hh_pat.has_next(result: r);
			do while(r ne 0);
				hh_pat.find_next();
 				output;
				hh_pat.has_next(result: r);
			end;
	end;
end;
stop;
run;

That should be identical to

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;

View solution in original post

mohamed_zaki · Posted 01-13-2016 08:21 PM

Based on that A is your lookup table


data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
 
do until(eof);
 	set b end=eof;
 	call missing(d1, d2, d3); 
 	rc=hh_pat.find();
 	output;
	if (rc = 0) then do;
		hh_pat.has_next(result: r);
			do while(r ne 0);
				hh_pat.find_next();
 				output;
				hh_pat.has_next(result: r);
			end;
	end;
end;
stop;
run;

That should be identical to

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;

SAS_inquisitive · Posted 01-14-2016 09:01 AM

@mohamed_zaki. Thanks. I also want this way too.

proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from a left join b
on a.key=b.key;
quit;

mohamed_zaki · Posted 01-14-2016 10:06 AM

As your lookup table B is not having duplicate, then this is easier

data test(drop=rc);
 if 0 then set b;
 declare hash hh_pat(dataset:"b");
 rc=hh_pat.defineKey("key");
 rc=hh_pat.defineData("d4", "d5", "d6","other");
 rc=hh_pat.defineDone();
 do until(eof);
 set a end=eof;
 call missing(d4, d5, d6, other);
 rc=hh_pat.find();
 output;
 end;
 stop;
run;

or you can modify the previous post code if you will have duplicate.

SAS_inquisitive · Posted 01-14-2016 11:53 AM

@ mohamed_zaki, I guess we can't do this while keeping data set "a" in hash table, right?

mohamed_zaki · Posted 01-14-2016 12:18 PM

Yes, as A contains duplicate records based on your key. So you will end up with uncomplete left joined data set. Only if you do specify the option multidata="Y". And by that the hash object keeps the multiple values in a list that is associated with the key. Which can be traversed and manipulated by using several methods such as HAS_NEXT or FIND_NEXT as in the first post.

Ksharp · Posted 01-13-2016 08:34 PM

Your code looks good. Except:

data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;
data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if _n_=1 then do;
if 0 then set A;
dcl hash h(dataset:'A',multidata&colon;'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
call missing(of _all_);
set B (keep=key d4 d5 d6);
rc=h.find();
if rc ne 0 then output;
do while(rc=0);
output;
rc=h.find_next();
end;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away