BookmarkSubscribeRSS Feed
JoshGale
Calcite | Level 5

Hello,

 

I am attempting to join 2 datasets through at least one variable, in a hierarchy. For example, join on first name, or join on surname, in that order. However this looks like it works, but not perfectly. It creates duplicates in my first dataset by joining to different records in the second dataset. Is there a way to perform a join and making it stop once a match is found. The below example is not exactly the code. There are actually around 44 joins I need it to run through. I know that it isn't doing both joins separately, because if it were I would get over 2000 additional duplicates (I'm currently getting 52), but ideally I would like no duplicates, once a match is found on the first join, I would like it to stop. I hope I've explained this okay. Thanks in advance for any help you can provide.

 

Current Code Example:

proc sql;

create table joined as

select a.*, b.*

from Houeshold as a full join Individual as b

on a.ID1=b.ID1

or a.ID2=b.ID2;

run;

 

Result

Dataset1ID    Dataset2ID

1              a

1              b

2              c

2              d

3              e

3 REPLIES 3
Tom
Super User Tom
Super User

Why not just add a step that picks the "best" match?

JoshGale
Calcite | Level 5
Hi, we have thought about it, but due to the complex outputs and sheer number of joins it would be challenging to do. It is our backup option, but we are also interested to see if anyone knows why this is happening and if there is a way to do it within the join.
Tom
Super User Tom
Super User

Give each match a score and take the observation with the max score for your primary key variables.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 470 views
  • 0 likes
  • 2 in conversation