BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
caveman529
Calcite | Level 5

Ladies and Gentlemen:

How can I join two table just like the follwoing proc sql syntax did below using data step with hash table?

proc sql _method buffersize=1G;

     create table want as

           select a.*, b.target_id

                from big_have as a, small_have as b

                     where b.beg_date <= a.date < b.end_date and a.id = b.id;

quit;

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

data want ;

  if 0 then set big_have small_have ;

  if _N_=1 then do;

    declare hash h(dataset:'small_have', multidata:'y');

  h.defineKey('id');

  h.defineData('beg_date','end_date','target_id');

  h.defineDone();

  end;

  set big_have;

  if (0=h.find()) then do until (h.find_next());

    if beg_date <= date <= end_date then output;

  end;

  drop beg_date end_date ;

run;

View solution in original post

9 REPLIES 9
TarunKumar
Pyrite | Level 9

for more information pls go through this

http://support.sas.com/techsup/technote/ts553.html

caveman529
Calcite | Level 5

Thank you, Tarun.  But I hope to get at the syntax comparison....  Smiley Happy

Tom
Super User Tom
Super User

I do not understand what your question is.

Your SQL looks like an inner join.

Are you asking how to do that with a data step?

Or are you asking how to force SQL to put one of the tables into memory.  If this then did you look at the SASFILE statement?

caveman529
Calcite | Level 5

Hi, Tom:

Sorry for the confusion.  I hope to use data step hash table to join the table instead of using proc sql because SAS report running out of disk space or sort error if I use proc sql.  The big_have table on my PC is about 500 GB. 

Thank you!

Tom
Super User Tom
Super User

data want ;

  if 0 then set big_have small_have ;

  if _N_=1 then do;

    declare hash h(dataset:'small_have', multidata:'y');

  h.defineKey('id');

  h.defineData('beg_date','end_date','target_id');

  h.defineDone();

  end;

  set big_have;

  if (0=h.find()) then do until (h.find_next());

    if beg_date <= date <= end_date then output;

  end;

  drop beg_date end_date ;

run;

caveman529
Calcite | Level 5

Many thanks!  will implement right now.  The syntax is a bit hard to understand..  Do you mind giving me some hint so that I can learn how to do it in the future.  Thanks!

Tom
Super User Tom
Super User

Read up on the hash object in the manual.

if 0 -> trick to get the variables defined by referencing the input tables in a SET statement that can never be executed.

if _N_=1 -> This will make sure the definition of the hash is done only once.  The definition is pretty straight forward and right out of the manual. Notice that when referencing dataset and variables in the hash commands they need to be quotes. You are basically passing the names to the hash object.

   Only wrinkle is that need to add the multidata tag so that it will load multiple records for the same value of the key.

  So this whole block will define the hash object and load the small dataset into it.

The rest of the data step will loop over the values of the big datset because of the SET statement.

The FIND() function will check if the ID value just read from BIG is in SMALL. 0 means YES in this case.

The DO loop will execute and the UNTIL clause will execute the FIND_NEXT() function at the end of the loop and treat the returned value as a boolean result. Any 0 means YES there was another found, Any non zero will be taken as TRUE and cause the UNTIL() clause to end the loop.

The IF ... THEN OUTPUT is what decides whether this record from small matches the date from BIG.  The OUTPUT causes a row to be written. Because there is an OUTPUT no other rows will be written. (basic data step logic here).

The DROP is because you did not include those variables in your SQL, but they are needed in the data step so that they can be used in the IF statement.

BhavinVala
Fluorite | Level 6

It would be fine to try MAGIC=103 option in PROC SQL statement.

Option Description

MAGIC=101 Influences the SQL optimizer to select the Nested Loop join algorithm.

MAGIC=102 Influences the SQL optimizer to select the Sort-Merge join algorithm.

MAGIC=103 Influences the SQL optimizer to select the Hash join algorithm

http://www.mwsug.org/proceedings/2012/S1/MWSUG-2012-S109.pdf

caveman529
Calcite | Level 5

I came across that paper before, but I thought the magic is just used in the title.  Unbeknownst to me, it is for real.  I'll try it out.  Thank you!

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3273 views
  • 3 likes
  • 4 in conversation