Ladies and Gentlemen:
How can I join two table just like the follwoing proc sql syntax did below using data step with hash table?
proc sql _method buffersize=1G;
create table want as
select a.*, b.target_id
from big_have as a, small_have as b
where b.beg_date <= a.date < b.end_date and a.id = b.id;
quit;
Thank you!
data want ;
if 0 then set big_have small_have ;
if _N_=1 then do;
declare hash h(dataset:'small_have', multidata:'y');
h.defineKey('id');
h.defineData('beg_date','end_date','target_id');
h.defineDone();
end;
set big_have;
if (0=h.find()) then do until (h.find_next());
if beg_date <= date <= end_date then output;
end;
drop beg_date end_date ;
run;
for more information pls go through this
Thank you, Tarun. But I hope to get at the syntax comparison....
I do not understand what your question is.
Your SQL looks like an inner join.
Are you asking how to do that with a data step?
Or are you asking how to force SQL to put one of the tables into memory. If this then did you look at the SASFILE statement?
Hi, Tom:
Sorry for the confusion. I hope to use data step hash table to join the table instead of using proc sql because SAS report running out of disk space or sort error if I use proc sql. The big_have table on my PC is about 500 GB.
Thank you!
data want ;
if 0 then set big_have small_have ;
if _N_=1 then do;
declare hash h(dataset:'small_have', multidata:'y');
h.defineKey('id');
h.defineData('beg_date','end_date','target_id');
h.defineDone();
end;
set big_have;
if (0=h.find()) then do until (h.find_next());
if beg_date <= date <= end_date then output;
end;
drop beg_date end_date ;
run;
Many thanks! will implement right now. The syntax is a bit hard to understand.. Do you mind giving me some hint so that I can learn how to do it in the future. Thanks!
Read up on the hash object in the manual.
if 0 -> trick to get the variables defined by referencing the input tables in a SET statement that can never be executed.
if _N_=1 -> This will make sure the definition of the hash is done only once. The definition is pretty straight forward and right out of the manual. Notice that when referencing dataset and variables in the hash commands they need to be quotes. You are basically passing the names to the hash object.
Only wrinkle is that need to add the multidata tag so that it will load multiple records for the same value of the key.
So this whole block will define the hash object and load the small dataset into it.
The rest of the data step will loop over the values of the big datset because of the SET statement.
The FIND() function will check if the ID value just read from BIG is in SMALL. 0 means YES in this case.
The DO loop will execute and the UNTIL clause will execute the FIND_NEXT() function at the end of the loop and treat the returned value as a boolean result. Any 0 means YES there was another found, Any non zero will be taken as TRUE and cause the UNTIL() clause to end the loop.
The IF ... THEN OUTPUT is what decides whether this record from small matches the date from BIG. The OUTPUT causes a row to be written. Because there is an OUTPUT no other rows will be written. (basic data step logic here).
The DROP is because you did not include those variables in your SQL, but they are needed in the data step so that they can be used in the IF statement.
It would be fine to try MAGIC=103 option in PROC SQL statement.
Option Description
MAGIC=101 Influences the SQL optimizer to select the Nested Loop join algorithm.
MAGIC=102 Influences the SQL optimizer to select the Sort-Merge join algorithm.
MAGIC=103 Influences the SQL optimizer to select the Hash join algorithm
http://www.mwsug.org/proceedings/2012/S1/MWSUG-2012-S109.pdf
I came across that paper before, but I thought the magic is just used in the title. Unbeknownst to me, it is for real. I'll try it out. Thank you!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.