BookmarkSubscribeRSS Feed
sgleeson2
Calcite | Level 5

I have 2 datasets, a and b which I am merging using a left join. Table a has duplicate join keys, but b does not. When I complete the join I am ending up with more records in a than before. I know the duplicate keys are a problem but they are not something I can change. However, I don't understand WHY they are causing the issue.

 

My understanding is that in cases of duplicate keys in the left table (a), the corresponding key in the right table (b) will simply match to both occurrences in a, without creating an extra records? Where am I going wrong here? Thanks.

1 REPLY 1
ballardw
Super User

The JOINs in general when you use any ON predicate combine ALL of the join records from the first set with EACH of the matches in the other set. So you get this multiplicative behavior. 1 Key in set a and 2 keys in set b = 2 matches.

 

It may be possible to filter the results but there is no guarantee.

 

It is always a good idea to provide at least the code you are attempting and better to include example data along with the desired result.

 


@sgleeson2 wrote:

I have 2 datasets, a and b which I am merging using a left join. Table a has duplicate join keys, but b does not. When I complete the join I am ending up with more records in a than before. I know the duplicate keys are a problem but they are not something I can change. However, I don't understand WHY they are causing the issue.

 

My understanding is that in cases of duplicate keys in the left table (a), the corresponding key in the right table (b) will simply match to both occurrences in a, without creating an extra records? 


Joins are record by record constructs so you are misunderstanding the basic behavior of a join.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 331 views
  • 0 likes
  • 2 in conversation