Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.
Table 1
ID | Year |
ABC | 2001 |
ABC | 2001 |
DEF | 2002 |
Table 2
ID | Year |
ABC | 2001 |
DEF | 2002 |
DEF | 2002 |
Join (left or inner)
ID | YEAR |
ABC | 2001 (table 1) |
ABC | 2001 (table 1) |
DEF | 2002 (table 2) |
DEF | 2002 (table 2) |
@Haemoglobin17 wrote:
Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
A possible explanation is that dataset B has multiple records for some values of the joined variables.
If that is the case, then using select distinct instead of select might fix the problem. Or it might not fix the problem if those multiple records in data set B have different values.
You have duplicates on your key joining variables in one or both data sets. You likely need to modify your join criteria somehow.
Table 1
ID | Year |
ABC | 2001 |
ABC | 2001 |
DEF | 2002 |
Table 2
ID | Year |
ABC | 2001 |
DEF | 2002 |
DEF | 2002 |
Join (left or inner)
ID | YEAR |
ABC | 2001 (table 1) |
ABC | 2001 (table 1) |
DEF | 2002 (table 2) |
DEF | 2002 (table 2) |
@Haemoglobin17 wrote:
Hi everyone!
I have a dataset (A) of approximately 2120000, I wanted to left join a couple of variables from a dataset B with a simple proc sql, when I realized that the number of rows in the resulting dataset (C) was 2140000.
I just wanted to add the two columns to the first rows keeping all the records from the first column. Is there a way?
I also tried a inner join between the first dataset 2120000 (A) rows and the table that I produced (2140000 rows) (C) and I obtained a dataset with 4 million of records! It is so strange! With a inner join the number of rows should drop, isn't it?
Is there any way I can solve this issue?
Thanks
The reason the numbers don't match is that your data doesn't allow them to match. Data set B has replicates that interfere with what you are trying to do (which may not be possible). If you examine data set B, you will see these replicates. Or you could run PROC FREQ on the joining variables.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: