Solved: Re: Hash join Learning

umashankersaini · Posted 12-05-2018 02:05 AM

Hi Team,

I am looking for same output from HASH join as i have from proc sql. Kindly help.

DATA A;
input Name $1. Salary Age;
Cards;
A 200 20
B 300 30
C 400 40
D 500 50
E 600 60
F 700 70
;
Run;

DATA B;
input Name $1. Salary Grade$2.;
Cards;
A 200 AA
G 800 GG
H 900 HH
I 1000 II
F 700 FF
;
Run;

Proc print;
run;

proc sql; create table AB_SQL as select A.*,B.Grade from A left join B on A.Name=B.Name; quit;

PeterClemmensen · Posted 12-05-2018 02:13 AM

Do like this

data AB_Hash(drop=rc);
   if 0 then set A B;
   if _N_ = 1 then do;
      declare hash h(dataset:'B');
      h.defineKey('Name');
      h.defineData('Grade');
      h.defineDone();
   end;

   set A;

   rc=h.find();
   if rc ne 0 then Grade='';
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 12-05-2018 02:13 AM

Do like this

data AB_Hash(drop=rc);
   if 0 then set A B;
   if _N_ = 1 then do;
      declare hash h(dataset:'B');
      h.defineKey('Name');
      h.defineData('Grade');
      h.defineDone();
   end;

   set A;

   rc=h.find();
   if rc ne 0 then Grade='';
run;

The DATA to DATA Step Macro
Blog: SASnrd

umashankersaini · Posted 12-05-2018 06:22 AM

Hi,

Thanks for your reply.

quick question.. Suppose i have 100 variables in B data set and want 40 variable from it in output.

Do i need to define them in below mentioned style:

defineData('Grade',GR1,TR1,AR20);

PeterClemmensen · Posted 12-05-2018 06:25 AM

There are several ways to do this. Do the 40 variables in question have some common naming scheme?

And remember, the variables here should be in quotation

defineData('Grade','GR1','TR1','AR20');

The DATA to DATA Step Macro
Blog: SASnrd

umashankersaini · Posted 12-05-2018 07:52 AM

yes.... All the variable has different name...No pattern and output includes both type like character (20 var) and Numeric (20 var).

PeterClemmensen · Posted 12-05-2018 08:11 AM

If all the variables have different names, no common prefixes or like that exists, then it is probably the best to just type them in manually. Unless you can live with the fact that all the variables are defined in the data portion of your hash object. In that case you could do

h.defineData(all:"Y");

The DATA to DATA Step Macro
Blog: SASnrd

umashankersaini · Posted 12-05-2018 08:42 AM

Great ... able to do that...

other important question ... what if i need to join more than two tables..

and any relevant doc to study Hash ...

PeterClemmensen · Posted 12-05-2018 08:51 AM

"what if i need to join more than two tables"

If you post a third data set and, I can provide you a usable code answer. The short answer is to define another hash object and use the find() method again.

"any relevant doc to study Hash"

There is plenty of material to study in the SAS Hash Object Documentation.

However, I found the two books SAS Hash Object Programming Made Easy by Michelle Burlew and Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study By Don Hendersen and Paul Dorfman to be the best learning material on the topic out there.

The DATA to DATA Step Macro
Blog: SASnrd

umashankersaini · Posted 12-12-2018 08:23 AM

Hi,

I am so sorry for my delayed response.

During this time i tried with one more table for inner and left join but i was not able to do that.

Kindly help.

DATA C;
input Name $1. dept;
Cards;
A 111
B 222
C 333
Z 000
;
RUN;

PeterClemmensen · Posted 12-12-2018 08:26 AM

@umashankersaini, please open a new thread. That will help other users benefit from your question and the answers.

The DATA to DATA Step Macro
Blog: SASnrd

umashankersaini · Posted 12-05-2018 09:01 AM

what about inner join... I have tried hash for inner as well but in log... Table A has been read by 2 times..Can you please help me for inner join as well (optimized)

PeterClemmensen · Posted 12-05-2018 09:10 AM

You can simulate an inner join using the return code rc from the find() method like this (Not I changed the left join to an inner join in your initial code for demonstration purposes).

proc sql; create table AB_SQL as select A.*,B.Grade from A inner join B on A.Name=B.Name; quit;

data AB_Hash(drop=rc);
   if 0 then set A B;
   if _N_ = 1 then do;
      declare hash h(dataset:'B');
      h.defineKey('Name');
      h.defineData('Grade');
      h.defineDone();
   end;

   set A;

   rc=h.find();
   if rc eq 0;
run;

proc compare data=AB_SQL compare=AB_Hash;
run;

The DATA to DATA Step Macro
Blog: SASnrd

learsaas · Posted 12-05-2018 02:20 AM

andreas_lds · Posted 12-05-2018 02:41 AM

@learsaas wrote:

Please post code as text, not as picture. Also note that hash.find returns 0 if the key was found.

PeterClemmensen · Posted 12-05-2018 02:23 AM

You can verify that the data sets are identical with a simple PROC COMPARE like this

proc compare data=AB_SQL compare=AB_Hash;
run;

The DATA to DATA Step Macro
Blog: SASnrd

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away