hash equivalent of sql

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

hash equivalent of sql

Ladies and Gentlemen:

How can I join two table just like the follwoing proc sql syntax did below using data step with hash table?

proc sql _method buffersize=1G;

     create table want as

           select a.*, b.target_id

                from big_have as a, small_have as b

                     where b.beg_date <= a.date < b.end_date and a.id = b.id;

quit;

Thank you!

Attachment
Attachment

Accepted Solutions
Solution
‎03-09-2014 01:11 PM
Super User
Super User
Posts: 7,078

Re: hash equivalent of sql

Posted in reply to caveman529

data want ;

  if 0 then set big_have small_have ;

  if _N_=1 then do;

    declare hash h(dataset:'small_have', multidata:'y');

  h.defineKey('id');

  h.defineData('beg_date','end_date','target_id');

  h.defineDone();

  end;

  set big_have;

  if (0=h.find()) then do until (h.find_next());

    if beg_date <= date <= end_date then output;

  end;

  drop beg_date end_date ;

run;

View solution in original post


All Replies
Frequent Contributor
Posts: 78

Re: hash equivalent of sql

Posted in reply to caveman529

for more information pls go through this

http://support.sas.com/techsup/technote/ts553.html

Regular Contributor
Posts: 161

Re: hash equivalent of sql

Posted in reply to TarunKumar

Thank you, Tarun.  But I hope to get at the syntax comparison....  Smiley Happy

Super User
Super User
Posts: 7,078

Re: hash equivalent of sql

Posted in reply to caveman529

I do not understand what your question is.

Your SQL looks like an inner join.

Are you asking how to do that with a data step?

Or are you asking how to force SQL to put one of the tables into memory.  If this then did you look at the SASFILE statement?

Regular Contributor
Posts: 161

Re: hash equivalent of sql

Hi, Tom:

Sorry for the confusion.  I hope to use data step hash table to join the table instead of using proc sql because SAS report running out of disk space or sort error if I use proc sql.  The big_have table on my PC is about 500 GB. 

Thank you!

Solution
‎03-09-2014 01:11 PM
Super User
Super User
Posts: 7,078

Re: hash equivalent of sql

Posted in reply to caveman529

data want ;

  if 0 then set big_have small_have ;

  if _N_=1 then do;

    declare hash h(dataset:'small_have', multidata:'y');

  h.defineKey('id');

  h.defineData('beg_date','end_date','target_id');

  h.defineDone();

  end;

  set big_have;

  if (0=h.find()) then do until (h.find_next());

    if beg_date <= date <= end_date then output;

  end;

  drop beg_date end_date ;

run;

Regular Contributor
Posts: 161

Re: hash equivalent of sql

Many thanks!  will implement right now.  The syntax is a bit hard to understand..  Do you mind giving me some hint so that I can learn how to do it in the future.  Thanks!

Super User
Super User
Posts: 7,078

Re: hash equivalent of sql

Posted in reply to caveman529

Read up on the hash object in the manual.

if 0 -> trick to get the variables defined by referencing the input tables in a SET statement that can never be executed.

if _N_=1 -> This will make sure the definition of the hash is done only once.  The definition is pretty straight forward and right out of the manual. Notice that when referencing dataset and variables in the hash commands they need to be quotes. You are basically passing the names to the hash object.

   Only wrinkle is that need to add the multidata tag so that it will load multiple records for the same value of the key.

  So this whole block will define the hash object and load the small dataset into it.

The rest of the data step will loop over the values of the big datset because of the SET statement.

The FIND() function will check if the ID value just read from BIG is in SMALL. 0 means YES in this case.

The DO loop will execute and the UNTIL clause will execute the FIND_NEXT() function at the end of the loop and treat the returned value as a boolean result. Any 0 means YES there was another found, Any non zero will be taken as TRUE and cause the UNTIL() clause to end the loop.

The IF ... THEN OUTPUT is what decides whether this record from small matches the date from BIG.  The OUTPUT causes a row to be written. Because there is an OUTPUT no other rows will be written. (basic data step logic here).

The DROP is because you did not include those variables in your SQL, but they are needed in the data step so that they can be used in the IF statement.

New Contributor
Posts: 3

Re: hash equivalent of sql

Posted in reply to caveman529

It would be fine to try MAGIC=103 option in PROC SQL statement.

Option Description

MAGIC=101 Influences the SQL optimizer to select the Nested Loop join algorithm.

MAGIC=102 Influences the SQL optimizer to select the Sort-Merge join algorithm.

MAGIC=103 Influences the SQL optimizer to select the Hash join algorithm

http://www.mwsug.org/proceedings/2012/S1/MWSUG-2012-S109.pdf

Regular Contributor
Posts: 161

Re: hash equivalent of sql

Posted in reply to BhavinVala

I came across that paper before, but I thought the magic is just used in the title.  Unbeknownst to me, it is for real.  I'll try it out.  Thank you!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 669 views
  • 3 likes
  • 4 in conversation