BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Haemoglobin17
Obsidian | Level 7

Hi everyone!

I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.

I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?

I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?

Is there any way I can solve this issue?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.

 

Table 1

ID Year
ABC 2001
ABC 2001
DEF 2002

 

Table 2

ID Year
ABC 2001
DEF 2002
DEF 2002

 

Join (left or inner)

ID YEAR
ABC 2001 (table 1)
ABC  2001 (table 1)
DEF 2002 (table 2)
DEF 2002 (table 2)

 


@Haemoglobin17 wrote:

Hi everyone!

I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.

I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?

I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?

Is there any way I can solve this issue?

Thanks


 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

A possible explanation is that dataset B has multiple records for some values of the joined variables. 

 

If that is the case, then using select distinct instead of select might fix the problem. Or it might not fix the problem if those multiple records in data set B have different values.

--
Paige Miller
Haemoglobin17
Obsidian | Level 7
Hi Paige,
Thank you for you message! I put a distinct option after the select and the number are nearer to what I wanted: when left joining A to B, C is 2130000 and when inner joining C with A, I obtain a number of rows that is just a bit more than A.
I gave for granted the proc sql joins and considered basic skills, but maybe I didn't understand very much!!
Reeza
Super User

You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.

 

Table 1

ID Year
ABC 2001
ABC 2001
DEF 2002

 

Table 2

ID Year
ABC 2001
DEF 2002
DEF 2002

 

Join (left or inner)

ID YEAR
ABC 2001 (table 1)
ABC  2001 (table 1)
DEF 2002 (table 2)
DEF 2002 (table 2)

 


@Haemoglobin17 wrote:

Hi everyone!

I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.

I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?

I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?

Is there any way I can solve this issue?

Thanks


 

Haemoglobin17
Obsidian | Level 7
Hi Reeza!
Thank you for your message!
I added a second key in the joining and a distinct in the select.
Now I loose only 42 cases. It's a big improvement, but I still a bit troubled for these numbers that doesn't match 😮
PaigeMiller
Diamond | Level 26

The reason the numbers don't match is that your data doesn't allow them to match. Data set B has replicates that interfere with what you are trying to do (which may not be possible). If you examine data set B, you will see these replicates. Or you could run PROC FREQ on the joining variables.

--
Paige Miller
Reeza
Super User
Find the ones that don't match and look at each record to identify the difference manually then update your code to account for it.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 534 views
  • 4 likes
  • 3 in conversation