- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I want to make a join between a small table and a big one.
The hash method seems to me the right solution.
i develop this program, but the result whith hash method is different whith sql
In the case where I have to join with data many to many, how can I adapt this solution?
thank you
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data c;
if _n_ = 1 then do ;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
end;
do until(eof);
set b end=eof;
if b.find()=0 then output;
else do;
call missing(adata ,bdata, adata2);
output;
end;
end;
stop;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a b ;
dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set a end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
drop rc;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a b ;
dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set a end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
drop rc;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some Notes:
1. No need for if _n_=1 when going for a full pass of dataset while _n_=1, since there is gonna be only one hash instance at run time
2. do until pass akin to dow is apparently not any different in performance if you used if _n_=1 ,creating a hash instance and a traditional data-step as portrayed in sas docs.
3. If you are using PD hash guru's style of coding, great but you have to be very careful knowing how instances value vary and differ.
Well after all, it's all about diligence
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We can add a filtre on the date column in the big table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any date values in your samples. I assume you can
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Why the 8 and 9 did not come in result whith Hash method?
thank you
data a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
8 7 5
9 5 4
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
proc sql;
create table yes as select a.*,b.bdata
from a left join b on a.key=b.key;
quit;
data want_hash;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata (all:'y') ;
b.definedone () ;
do until(eof);
set b end=eof;
rc=b.find();
if rc ne 0 then do;
call missing(bdata);
output;
end;
else
do while(rc eq 0);
output;
rc=b.find_next();
end;
end;
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your left table is a , but in your code
do until(eof);
set b end=eof;
so change the above to
do until(eof);
set a end=eof;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the table A is the left table.
It's the small table that is loaded in memory through declare the object hash
In the statement set; the big table is B.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
to match the sql results, the change i suggested works
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The table B have 6 millions obseravation. loaded this in memory ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok, please clarify and elaborate your requirement with your HAVE samples and WANT sample clearly. Let us work on the solution.
6 million is not big in my opinion. Use hashexp:20 just in case