I am using the following code to left join
proc sql;
create table want as
select *
from final3 left join gov_proxy
on final3.gvkey=gov_proxy.gvkey and
final3.fyear=gov_proxy.fyear;
quit;
I have 63,654 observations in my final3 dataset and
I have 186679 observations in my gov-proxy dataset
I want to have 63,654 dataset but I am getting 63888 dataset after deleting the duprecs after merging.
I have no duprecs in any one of the dataset before merging.
Can anyone help me with this?
This isn't a duplicate record problem, it is a duplicate key problem. Your join results indicate that you either have duplicate key values in either final3 or gov_proxy or both. You can test this easily:
proc sort data = final3 out = dupkey nodupkey;
by gvkey fyear;
run;
If dupkey has less records than final3 then you know it contains duplicate key values. Then the question is should the data have duplicate key values and if not how do you wish to fix this?
You may not have any duplicate records, but you have at least 234 duplicate keys (gvkey-fyear) in gov_proxy.
This isn't a duplicate record problem, it is a duplicate key problem. Your join results indicate that you either have duplicate key values in either final3 or gov_proxy or both. You can test this easily:
proc sort data = final3 out = dupkey nodupkey;
by gvkey fyear;
run;
If dupkey has less records than final3 then you know it contains duplicate key values. Then the question is should the data have duplicate key values and if not how do you wish to fix this?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.