08-10-2017 09:57 AM - edited 08-10-2017 01:27 PM
I am new to hash tables, I know this is an efficient way to merge tables. I have found articles how to merge two or more tables in one way only left join, right join or inner in one data step, already. However, I wonder if it is possible to merge three or more tables. Two of them like using left join, third inner join by hash tables in one data step?
08-10-2017 02:24 PM
Yes, it is possible. But if the datasets have variables in common other than the join keys, you'll have to program logic accordingly. SQL is easier in this regard in that you can just specify A.myvar or B.myvar, etc.
08-10-2017 04:54 PM
Please note hash tables need to be loaded into memory so the more tables you load the less available memory you will have for each table.
You need to balance any performance gains you might get against the increased coding complexity and any memory limitations you might have.
If you are doing any data lookups in your joins (looking up just one or two columns) you might want to consider using SAS format lookups instead. They are a lot less complicated to code and are just as fast as hash tables.
08-10-2017 05:14 PM
I have written this code already. It seems to be fast enough,
klasa3 is left join table and klasa1 is inner join table.
data hash_test; if 0 then set klasa1 klasa2 klasa3; if _N_= 1 then do; declare hash merge(dataset:'klasa1'); merge.definekey('name'); merge.definedata(all:'yes'); merge.definedone(); declare hash merge2(dataset:'klasa3'); merge2.definekey('name'); merge2.definedata('test'); merge2.definedone(); end; do until(eof); set klasa2 end=eof; rc=merge2.find(); if rc =0 then do; if merge.find()=0 then output; end; else do; if merge.find()=0 then do; call missing (test); output; end; end; end; stop; run;