I am trying to create a hash table for the below criteria. Please help.
Table_A
ID Name Fee_Amt
1 a 0
2 b 200
3 c 0
4 d 400
5 e 500
Table_B
ID City
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
Result
ID Name City Fee_Amt Fee
1 a Test1 0 No
2 b Test2 200 Yes
3 c Test3 0 No
4 d Test4 400 Yes
5 e Test5 500 Yes
Usually, in PROC SQL the below code works good but it takes hours to run if the observations are in millions. I happened to know that Hash tables are effective in this case. If some can help me creating a hash table for left join, it would be a great help. Thank you.
Proc SQL Code:
PROC SQL;
CREATE TABLE RESULT AS
SELECT A.*, B.CITY, CASE WHEN (A.ID = B.ID AND A.FEE_AMT > 0) THEN 'YES' ELSE 'NO' END AS FEE
FROM TABLE_A LEFT JOIN TABLE_B
ON A.ID = B.ID;
QUIT;
Hi Hima,
I am new to hash as well, so just to start discussion. Assume that your table_b is smaller that you want to put it into hash(), also you want a merge(inner join), not a left join showing in your SQL code.
data Table_A;
input ID Name $ Fee_Amt;
cards;
1 a 0
2 b 200
3 c 0
4 d 400
5 e 500
6 t 100
;
data Table_B;
input ID City $;
cards;
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
;
data want;
if 0 then set table_b;
if _n_=1 then do;
declare hash hh(dataset: 'table_b');
hh.definekey ('id');
hh.definedata('city');
hh.definedone();
end;
set table_a;
fee=ifc(fee_amt=0,'NO','YES');
_n_=hh.find();
if _n_=0 then output;
run;
proc print;run;
Regards,
Haikuo
Hi Hima,
I am new to hash as well, so just to start discussion. Assume that your table_b is smaller that you want to put it into hash(), also you want a merge(inner join), not a left join showing in your SQL code.
data Table_A;
input ID Name $ Fee_Amt;
cards;
1 a 0
2 b 200
3 c 0
4 d 400
5 e 500
6 t 100
;
data Table_B;
input ID City $;
cards;
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
;
data want;
if 0 then set table_b;
if _n_=1 then do;
declare hash hh(dataset: 'table_b');
hh.definekey ('id');
hh.definedata('city');
hh.definedone();
end;
set table_a;
fee=ifc(fee_amt=0,'NO','YES');
_n_=hh.find();
if _n_=0 then output;
run;
proc print;run;
Regards,
Haikuo
Thank you. This is exactly what I need. Great Expertise!!!!
hi ... you can also do some reading (my experience with these is that I think that this order works best) ...
Getting Started with the DATA Step Hash Object
http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf
Getting Started with the DATA Step Hash Iterator
http://support.sas.com/rnd/base/datastep/dot/iterator-getting-started.pdf
How Do I Love Hash Tables? Let Me Count The Ways!
http://www2.sas.com/proceedings/forum2008/029-2008.pdf
Better Hashing in SAS 9.2
http://support.sas.com/resources/papers/sgf2008/hashing92.pdf
Hash Crash and Beyond
http://www2.sas.com/proceedings/forum2008/037-2008.pdf
and this is nice to have around ...
Hash Object Tip Sheet
http://support.sas.com/rnd/base/datastep/dot/hash-tip-sheet.pdf
can anybody give example and code for how to incorporate an option 'NODUPKEY' in hash sort as in proc sort with nodupkey?
thanks.
By default, Hash Table is NODUPKEY, it keeps the first one of duplicated key.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.