Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.
Table 1
ID | Year |
ABC | 2001 |
ABC | 2001 |
DEF | 2002 |
Table 2
ID | Year |
ABC | 2001 |
DEF | 2002 |
DEF | 2002 |
Join (left or inner)
ID | YEAR |
ABC | 2001 (table 1) |
ABC | 2001 (table 1) |
DEF | 2002 (table 2) |
DEF | 2002 (table 2) |
@Haemoglobin17 wrote:
Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
A possible explanation is that dataset B has multiple records for some values of the joined variables.
If that is the case, then using select distinct instead of select might fix the problem. Or it might not fix the problem if those multiple records in data set B have different values.
You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.
Table 1
ID | Year |
ABC | 2001 |
ABC | 2001 |
DEF | 2002 |
Table 2
ID | Year |
ABC | 2001 |
DEF | 2002 |
DEF | 2002 |
Join (left or inner)
ID | YEAR |
ABC | 2001 (table 1) |
ABC | 2001 (table 1) |
DEF | 2002 (table 2) |
DEF | 2002 (table 2) |
@Haemoglobin17 wrote:
Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
The reason the numbers don't match is that your data doesn't allow them to match. Data set B has replicates that interfere with what you are trying to do (which may not be possible). If you examine data set B, you will see these replicates. Or you could run PROC FREQ on the joining variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.