Hello,
I need help with joining two tables with hash. Both tables has one unique key and one other duplicate variable. Below are the matching variables
x1 - y1
x15 - y17
/* 15 variables */
Table a
x1
.
.
x15
/* 17 variables */
Table b
y1
.
.
y17
I tried with following code but I am getting all the variables from table b and only one from table a instead I want to display all the var from table a and all the var from table b
data left_join(drop=rc); declare Hash a (dataset: "o.a"); rc=a.definekey('x1'); rc=a.definedata(); rc=a.definedone(); do until(eof); set o.b end=eof; rc=a.find(); output; end; stop; run;
Ah yes, the OP said it's many-to-one but my program assumed unique X1 values in A. Here's the code for duplicated X1 values:
data want;
if _n_=1 then do;
if 0 then set a;
declare hash right (dataset:'a',multidata:'Y');
right.definekey('x1');
right.definedata(all:'Y');
right.definedone();
right.add(); /* Add a dummy record with all missing vals*/
end;
set b; /* left */
rc=right.find(key:y1);
if rc^=0 then rc=right.find(key:.);
do while (rc=0);
output;
rc=right.find_next();
end;
run;
Edited at 6/1/2017 14:58 UTC-5:00
I guess you meant that x15 should match with y17, as there are only fifteen variables in dataset a?
That said, what is the relationship between the tables with regard to the key variables? Is it one-to-many, or many-to-many?
No x15 doesnt have to match y17. x1 should mach y1. It is one to many relationship.
Check this paper Using the SAS® Hash Object with Duplicate Key Entries
There is section "2. One-to-Many, Many-to-Many Left Joins" might have the answer you are looking for.
Hope this helps,
Ahmed
@kpdoe wrote:
No x15 doesnt have to match y17. x1 should mach y1. It is one to many relationship.
Then my preferred method is
proc sort data=a;
by x1;
run;
proc sort data=b;
by y1;
run;
data want;
merge
a (in=a)
b (in=b rename=(y1=x1))
;
by x1;
run;
You can add a subsetting if that selects if you want an inner, left or right join.
Just one caveat: this can do left, right or inner, but not many-to-many.
The problem with
data left_join(drop=rc); declare Hash a (dataset: "o.a"); rc=a.definekey('x1'); rc=a.definedata(); rc=a.definedone(); do until(eof); set o.b end=eof; rc=a.find(); output; end; stop; run;
data want;
if _n_=1 then do;
if 0 then set a;
declare hash right (dataset:'a');
right.definekey('x1');
right.definedata(all:'Y');
right.definedone();
right.add(); /* Add a dummy record with all missing vals*/
end;
set b; /* left */
rc=right.find(key:.);
rc=right.find(key:y1);
run;
Notes:
Ah yes, the OP said it's many-to-one but my program assumed unique X1 values in A. Here's the code for duplicated X1 values:
data want;
if _n_=1 then do;
if 0 then set a;
declare hash right (dataset:'a',multidata:'Y');
right.definekey('x1');
right.definedata(all:'Y');
right.definedone();
right.add(); /* Add a dummy record with all missing vals*/
end;
set b; /* left */
rc=right.find(key:y1);
if rc^=0 then rc=right.find(key:.);
do while (rc=0);
output;
rc=right.find_next();
end;
run;
Edited at 6/1/2017 14:58 UTC-5:00
Thank you very much. It worked!
Mark,
You're right. There's no need to make assumptions about data and code differently for many-to-one and many-to-many. MULTIDATA:"Y" covers all bases. If there're many, the loop gets all of them; and if there's one, the very first FIND_NEXT call fails, which is what the doctor ordered.
Best
Paul
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.