BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kpdoe
Calcite | Level 5

Hello,

 

I need help with joining two tables with hash. Both tables has one unique key and one other duplicate variable. Below are the matching variables

x1 - y1

x15 - y17

 

/* 15 variables */

Table a 

x1

.

.

x15

 

/* 17 variables */

Table b

y1

.

.

y17

 

I tried with following code but I am getting all the variables from table b and only one from table a instead I want to display all the var from table a and all the var from table b

 

 

data left_join(drop=rc);

	declare Hash a (dataset: "o.a");
	rc=a.definekey('x1');
	rc=a.definedata();
	rc=a.definedone();

	do until(eof);
	 set o.b end=eof;
	 rc=a.find();
	 output;
	end;
	stop;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Ah yes,  the OP said it's many-to-one but my program assumed unique X1 values in A.  Here's the code for duplicated X1 values:

 

data want;
  if _n_=1 then do;
    if 0 then set a;
    declare hash right (dataset:'a',multidata:'Y');
       right.definekey('x1');
       right.definedata(all:'Y');
       right.definedone();
       right.add();   /* Add a dummy record with all missing vals*/
  end;

  set b;  /* left */ 

  rc=right.find(key:y1); 
if rc^=0 then rc=right.find(key:.); do while (rc=0); output; rc=right.find_next(); end; run;

 

Edited at 6/1/2017 14:58 UTC-5:00

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

I guess you meant that x15 should match with y17, as there are only fifteen variables in dataset a?

 

That said, what is the relationship between the tables with regard to the key variables? Is it one-to-many, or many-to-many?

kpdoe
Calcite | Level 5

No x15 doesnt have to match y17. x1 should mach y1. It is one to many relationship.

AhmedAl_Attar
Rhodochrosite | Level 12

Check this paper Using the SAS® Hash Object with Duplicate Key Entries

There is section "2. One-to-Many, Many-to-Many Left Joins" might have the answer you are looking for.

 

Hope this helps,

Ahmed

Kurt_Bremser
Super User

@kpdoe wrote:

No x15 doesnt have to match y17. x1 should mach y1. It is one to many relationship.


Then my preferred method is

proc sort data=a;
by x1;
run;

proc sort data=b;
by y1;
run;

data want;
merge
  a (in=a)
  b (in=b rename=(y1=x1))
;
by x1;
run;

You can add a subsetting if that selects if you want an inner, left or right join.

mkeintz
PROC Star

Just one caveat: this can do left, right or inner, but not many-to-many.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

 

The problem with


 

 

data left_join(drop=rc);

	declare Hash a (dataset: "o.a");
	rc=a.definekey('x1');
	rc=a.definedata();
	rc=a.definedone();

	do until(eof);
	 set o.b end=eof;
	 rc=a.find();
	 output;
	end;
	stop;
run;

 

 

 


 

  1. Your a.definedata probably should be a.definedata(all:'Y').  That's probably why you are not getting all the A vars
  2. You're apparently simulating B left join A on b.y1=a.x1.  Your a.find() default to X1 as the lookup key, but you want Y1 to be the lookup key:
       try a.find(key:y1).
  3. In your loop, when a B is NOT found in A, all the vars in A should be set to missing, but you will not get that.  Instead you will get all the A vars in the most recent successful FIND method.

 

data want;
  if _n_=1 then do;
    if 0 then set a;
    declare hash right (dataset:'a');
       right.definekey('x1');
       right.definedata(all:'Y');
       right.definedone();
       right.add();   /* Add a dummy record with all missing vals*/
  end;

  set b;  /* left */ 

  rc=right.find(key:.);
rc=right.find(key:y1); run;

 

Notes:

  1. I added a row of missing value to the hash object based on dataset A.
  2. I use that row (rc=right.find(key:.) to set all A vars to missing prior to searching for A.X1=B.Y1.  If that search is not successful it won't inadvertantly inherit A vars from a prior successfull search.
  3. Editted addition: This assumes that X1 is unique in dataset A.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Ah yes,  the OP said it's many-to-one but my program assumed unique X1 values in A.  Here's the code for duplicated X1 values:

 

data want;
  if _n_=1 then do;
    if 0 then set a;
    declare hash right (dataset:'a',multidata:'Y');
       right.definekey('x1');
       right.definedata(all:'Y');
       right.definedone();
       right.add();   /* Add a dummy record with all missing vals*/
  end;

  set b;  /* left */ 

  rc=right.find(key:y1); 
if rc^=0 then rc=right.find(key:.); do while (rc=0); output; rc=right.find_next(); end; run;

 

Edited at 6/1/2017 14:58 UTC-5:00

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kpdoe
Calcite | Level 5

Thank you very much. It worked!

BernyOsuna
Obsidian | Level 7
I'm a little confused with the
multidata;'Y'
and the
right.find(key:y1)
hashman
Ammonite | Level 13

Mark,

 

You're right. There's no need to make assumptions about data and code differently for many-to-one and many-to-many. MULTIDATA:"Y" covers all bases. If there're many, the loop gets all of them; and if there's one, the very first FIND_NEXT call fails, which is what the doctor ordered.

 

Best

Paul 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 7298 views
  • 0 likes
  • 6 in conversation