Hi all. I am trying to duplicate SAS logic and get the same results that some else did previously merging two datasets. The first dataset (HPHC_EOL_FY20), has 85k lines and the second (EOL_COHORT), has 135 lines. In the original results that I am trying to duplicate the final result has 55k lines, when I run the below code I get 130k. I am no sure what I am doing wrong. Any advise would be appreciated.
PROC SQL;
create table HPHC_FY20_EOL_combined_v2 as
select med.*,eol.*
from Dataset1 as med
left join
Dataset2 as eol
on med.member_id=eol.INSUREDSIDNUMBER
;
Quit;
Without looking at the data, it is difficult to predict. However i assume the previous merge was done if both datasets had med.member_id=eol.INSUREDSIDNUMBER
, considering this i would try inner join instead if left join. please try and let me know if it works
PROC SQL;
create table HPHC_FY20_EOL_combined_v2 as
select med.*,eol.*
from Dataset1 as med
inner join
Dataset2 as eol
on med.member_id=eol.INSUREDSIDNUMBER
;
Quit;
@wheddingsjr wrote:
Thanks for the response Jag. I tried the inner join and got the same results. I think I have to delete dupes, but not sure how to do that.
proc sort
data=dataset2
out=dedup
nodupkey
;
by INSUREDSIDNUMBER;
run;
Please describe in detail what happened. "Did not work" is not helpful at all, and worthy of the proverbial blonde secretary.
Mind that a left join will ALWAYS have at least the number of observations of the "left" dataset. To build the intersection set, you have to use an inner join.
And how many observations resulted from the join of the large table with your deduped dataset?
Do you need the deduped dataset to create a subset of the large table, or should all observations from the large table be kept?
Did you change from a left join to an inner join? Please post the complete log of your current join step.
So that simply means your large table consists mostly of member_id's that have a match in the small table.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.