I want to peform left outer join in hash with following code (so as to reproduce the result of left join in Proc Sql).
data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;
data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if 0 then set A;
if _n_=1 then do;
dcl hash h(dataset:'A',multidata:'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
set B (keep=key d4 d5 d6);
rc=h.find();
do while(rc=0);
output;
rc=h.find_next();
end;
run;
Based on that A is your lookup table
data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
do until(eof);
set b end=eof;
call missing(d1, d2, d3);
rc=hh_pat.find();
output;
if (rc = 0) then do;
hh_pat.has_next(result: r);
do while(r ne 0);
hh_pat.find_next();
output;
hh_pat.has_next(result: r);
end;
end;
end;
stop;
run;
That should be identical to
proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;
Based on that A is your lookup table
data new_test1(drop=rc r);
if 0 then set a;
declare hash hh_pat(dataset:"a", multidata: "y");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d1", "d2", "d3");
rc=hh_pat.defineDone();
do until(eof);
set b end=eof;
call missing(d1, d2, d3);
rc=hh_pat.find();
output;
if (rc = 0) then do;
hh_pat.has_next(result: r);
do while(r ne 0);
hh_pat.find_next();
output;
hh_pat.has_next(result: r);
end;
end;
end;
stop;
run;
That should be identical to
proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from b left join a
on b.key=a.key;
quit;
@mohamed_zaki. Thanks. I also want this way too.
proc sql ;
select b.key, a.d1, a.d2, a.d3 , b.d4, b.d5 , b.d6
from a left join b
on a.key=b.key;
quit;
As your lookup table B is not having duplicate, then this is easier
data test(drop=rc);
if 0 then set b;
declare hash hh_pat(dataset:"b");
rc=hh_pat.defineKey("key");
rc=hh_pat.defineData("d4", "d5", "d6","other");
rc=hh_pat.defineDone();
do until(eof);
set a end=eof;
call missing(d4, d5, d6, other);
rc=hh_pat.find();
output;
end;
stop;
run;
or you can modify the previous post code if you will have duplicate.
@ mohamed_zaki, I guess we can't do this while keeping data set "a" in hash table, right?
Yes, as A contains duplicate records based on your key. So you will end up with uncomplete left joined data set. Only if you do specify the option multidata="Y". And by that the hash object keeps the multiple values in a list that is associated with the key. Which can be traversed and manipulated by using several methods such as HAS_NEXT or FIND_NEXT as in the first post.
Your code looks good. Except:
data a;
input key d1 d2 d3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
4 1 1 1
4 1 1 1
5 5 5 5
;
data b;
input key d4 d5 d6 other;
datalines;
1 1 1 1 0
2 2 2 2 0
3 3 3 3 0
4 4 4 4 0
;
data test(drop=rc);
if _n_=1 then do;
if 0 then set A;
dcl hash h(dataset:'A',multidata:'yes');
h.definekey('key');
h.definedata('d1','d2','d3');
h.definedone();
end;
call missing(of _all_);
set B (keep=key d4 d5 d6);
rc=h.find();
if rc ne 0 then output;
do while(rc=0);
output;
rc=h.find_next();
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.