Hello!
I'm kind new on hash usage and I was wondering if there is a way to perform a left join with hash without having to load the big table into hash instead the small one.
I have been reading multiple papers and articles about this but I haven't been able to came up with an idea of how this could be performed.
This is the code I have by now:
data a ;
input key adata ;
cards ;
1 1
2 2
3 3
4 4
5 5
6 6
7 7
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;
data c ;
if _n_ = 1 then do ;
if 0 then set b ;
dcl hash b (dataset: "b", multidata: "y") ;
b.definekey ("key") ;
b.definedata ("bdata") ;
b.definedone () ;
end ;
set a ;
iorc = b.find() ;
if iorc ne 0 then call missing (bdata) ;
output ;
do while (b.find_next() = 0) ;
output ;
end ;
run ;
Both of the sample datasets are small here, but I was thinking in an scenario where big table is really big, so I can't load it into memory.
Any suggestion? Thanks guys!
dataset swap using your examples:
data c1 ;
if _n_ = 1 then do ;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata ("key","adata") ;
b.definedone () ;
dcl hiter iter('b');
end ;
set b end=last;
by key;
if b.find()= 0 then do;
output;
if last.key then b.remove();
end;
if last then do;
rc = iter.first();
do while (rc = 0);
call missing(bdata);
output;
rc = iter.next();
end;
end;
drop rc;
run;
which one in your example, would you consider big or small to simulate?
Ok so my assumption was right, I hope you have seen my code. Thank you!
dataset swap using your examples:
data c1 ;
if _n_ = 1 then do ;
if 0 then set a ;
dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
b.definekey ("key") ;
b.definedata ("key","adata") ;
b.definedone () ;
dcl hiter iter('b');
end ;
set b end=last;
by key;
if b.find()= 0 then do;
output;
if last.key then b.remove();
end;
if last then do;
rc = iter.first();
do while (rc = 0);
call missing(bdata);
output;
rc = iter.next();
end;
end;
drop rc;
run;
Thanks!! that is a very nice approach to the solution, I guess I have to have the b table sorted to use this solution. But I think I can tweak a little the code using the find_next function to avoid the need of ordering it.
Anyway, thank you for your awsome reply!
And @iscgonzalez Thank you for your very impressive attention to detail. I wish I could think out of the box like you. Anyway here is a version that doesn't require sort:
data a ;
input key adata ;
cards ;
1 1
2 2
3 3
4 4
5 5
6 6
7 7
;
run ;
/*modified b dataset to make it unordered*/
data b ;
input key bdata ;
cards ;
1 11
6 61
1 12
3 31
6 62
4 4
6 63
7 7
;
run ;
data c2 ;
if _n_ = 1 then do ;
if 0 then set a ;
dcl hash b (dataset: "a") ;
b.definekey ("key") ;
b.definedata ("key","adata") ;
b.definedone () ;
dcl hiter iter('b');
end ;
do _n_=1 by 1 until(last);
set b end=last;
array t(100) ; /*this is arbitrary subscript number*/
if b.find()= 0 then do;
if key not in t then t(_n_)=key;
output;
end;
if last then do;
do _n_=1 to dim(t);
if b.check(key:t(_n_))=0 then b.remove(key:t(_n_));
end;
rc = iter.first();
do while (rc = 0);
call missing(bdata);
output;
rc = iter.next();
end;
end;
end;
drop rc t:;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.