Solved: Creating Hash tables

Hima · Posted 02-24-2012 10:24 AM

I am trying to create a hash table for the below criteria. Please help.

Table_A
ID Name Fee_Amt
1 a 0
2 b 200
3 c 0
4 d 400
5 e 500

Table_B
ID City
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5

Result
ID Name City      Fee_Amt      Fee
1      a      Test1    0                No
2      b      Test2   200             Yes
3      c      Test3    0                No
4      d      Test4   400             Yes
5      e      Test5   500             Yes

Usually, in PROC SQL the below code works good but it takes hours to run if the observations are in millions. I happened to know that Hash tables are effective in this case. If some can help me creating a hash table for left join, it would be a great help. Thank you.

Proc SQL Code:
PROC SQL;
CREATE TABLE RESULT AS
SELECT A.*, B.CITY, CASE WHEN (A.ID = B.ID AND A.FEE_AMT > 0) THEN 'YES' ELSE 'NO' END AS FEE
FROM TABLE_A LEFT JOIN TABLE_B
ON A.ID = B.ID;
QUIT;

Haikuo · Posted 02-24-2012 10:52 AM

Hi Hima,

I am new to hash as well, so just to start discussion. Assume that your table_b is smaller that you want to put it into hash(), also you want a merge(inner join), not a left join showing in your SQL code.

data Table_A;

input ID Name $ Fee_Amt;

cards;

1 a 0

2 b 200

3 c 0

4 d 400

5 e 500

6 t 100

;

data Table_B;

input ID City $;

cards;

1 Test1

2 Test2

3 Test3

4 Test4

5 Test5

;

data want;

if 0 then set table_b;

if _n_=1 then do;

declare hash hh(dataset: 'table_b');

hh.definekey ('id');

hh.definedata('city');

hh.definedone();

end;

set table_a;

fee=ifc(fee_amt=0,'NO','YES');

_n_=hh.find();

if _n_=0 then output;

run;

proc print;run;

Regards,

Haikuo

View solution in original post

Haikuo · Posted 02-24-2012 10:52 AM

Hi Hima,

I am new to hash as well, so just to start discussion. Assume that your table_b is smaller that you want to put it into hash(), also you want a merge(inner join), not a left join showing in your SQL code.

data Table_A;

input ID Name $ Fee_Amt;

cards;

1 a 0

2 b 200

3 c 0

4 d 400

5 e 500

6 t 100

;

data Table_B;

input ID City $;

cards;

1 Test1

2 Test2

3 Test3

4 Test4

5 Test5

;

data want;

if 0 then set table_b;

if _n_=1 then do;

declare hash hh(dataset: 'table_b');

hh.definekey ('id');

hh.definedata('city');

hh.definedone();

end;

set table_a;

fee=ifc(fee_amt=0,'NO','YES');

_n_=hh.find();

if _n_=0 then output;

run;

proc print;run;

Regards,

Haikuo

Hima · Posted 02-24-2012 11:03 AM

Thank you. This is exactly what I need. Great Expertise!!!!

MikeZdeb · Posted 02-24-2012 01:50 PM

hi ... you can also do some reading (my experience with these is that I think that this order works best) ...

Getting Started with the DATA Step Hash Object

http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf

Getting Started with the DATA Step Hash Iterator

http://support.sas.com/rnd/base/datastep/dot/iterator-getting-started.pdf

How Do I Love Hash Tables? Let Me Count The Ways!

http://www2.sas.com/proceedings/forum2008/029-2008.pdf

Better Hashing in SAS 9.2

http://support.sas.com/resources/papers/sgf2008/hashing92.pdf

Hash Crash and Beyond

http://www2.sas.com/proceedings/forum2008/037-2008.pdf

and this is nice to have around ...

Hash Object Tip Sheet

http://support.sas.com/rnd/base/datastep/dot/hash-tip-sheet.pdf

sassharp · Posted 03-13-2012 03:00 PM

can anybody give example and code for how to incorporate an option 'NODUPKEY' in hash sort as in proc sort with nodupkey?

thanks.

Ksharp · Posted 03-14-2012 12:23 AM

By default, Hash Table is NODUPKEY, it keeps the first one of duplicated key.

Creating Hash tables

Creating Hash tables

Creating Hash tables

Creating Hash tables

Re: Creating Hash tables

Creating Hash tables

Creating Hash tables

Catch up on SAS Innovate 2026

Creating Hash tables

Creating Hash tables

Creating Hash tables

Creating Hash tables

Re: Creating Hash tables

Creating Hash tables

Creating Hash tables

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away