Hello,
I want to make a join between a small table and a big one.
The hash method seems to me the right solution.
i develop this program, but the result whith hash method is different whith sql
In the case where I have to join with data many to many, how can I adapt this solution?
thank you
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data c;
if _n_ = 1 then do ;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
end;
do until(eof);
set b end=eof;
if b.find()=0 then output;
else do;
call missing(adata ,bdata, adata2);
output;
end;
end;
stop;
run;
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a b ;
dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set a end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
drop rc;
run;
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a b ;
dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set a end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
drop rc;
run;
Thank you
Some Notes:
1. No need for if _n_=1 when going for a full pass of dataset while _n_=1, since there is gonna be only one hash instance at run time
2. do until pass akin to dow is apparently not any different in performance if you used if _n_=1 ,creating a hash instance and a traditional data-step as portrayed in sas docs.
3. If you are using PD hash guru's style of coding, great but you have to be very careful knowing how instances value vary and differ.
Well after all, it's all about diligence
We can add a filtre on the date column in the big table?
I don't see any date values in your samples. I assume you can
Hello,
Why the 8 and 9 did not come in result whith Hash method?
thank you
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
8 7 5
9 5 4
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table yes as select a.*,b.bdata
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set b end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
run;
Your left table is a , but in your code
do until(eof);
set b end=eof;
so change the above to
do until(eof);
set a end=eof;
Yes, the table A is the left table.
It's the small table that is loaded in memory through declare the object hash
In the statement set; the big table is B.
to match the sql results, the change i suggested works
The table B have 6 millions obseravation. loaded this in memory ?
Ok, please clarify and elaborate your requirement with your HAVE samples and WANT sample clearly. Let us work on the solution.
6 million is not big in my opinion. Use hashexp:20 just in case
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.