DATA Step, Macro, Functions and more

Keep Full Sample of One File After Merge

Reply
New Contributor
Posts: 2

Keep Full Sample of One File After Merge

Hi Everyone

 

I am trying to keep the full sample of one file after merge. However, I tried many times but still got higher number. Could anyone give me some suggestions? Here is my code:

 

DATA ANALYTIC (replace=yes);

MERGE ADM_forHYBRID (in=x)  MFILE (in=y);
BY STATEFIP PROVIDER ;
if x=1 then output;
RUN;

 

The log shows that the ADM file has 20804087 observations and the MFILE has 177125 observations. However, after merge, the file has 20804107 observations. I want the final file to be 20804087 observations but I cannot get that number. Could anyone help me solve the issue? 

Thanks a lot!

Super User
Posts: 5,096

Re: Keep Full Sample of One File After Merge

This is a sign that MFILE occasionally contains more than one observation for a combination of STATEFIP / PROVIDER.

 

There are ways to deal with this, but first you have to decide ... what should happen when there is a one-to-many match?

New Contributor
Posts: 2

Re: Keep Full Sample of One File After Merge

Thank you Astounding.

 

I delete duplicate observations in the MFILE by STATEFIP and PROVIDER. But still get the same number of observations after merge the two datasets. Do you have any suggestion?

Super User
Posts: 5,096

Re: Keep Full Sample of One File After Merge

The issue will remain the same.  Perhaps there are observations in MFILE with the same STATEFIP + PROVIDER, but having a different value for some other variable.  You could definitely eliminate duplicates with:

 

proc sort data=mfile out=deduped nodupkey;

   by statefip provider;

run;

 

While that gives you the proper number of observations, it will not tell you which observations were selected and which were deleted.  You might just want to examine where the duplicates come from:

 

data small_subset;

set mfile;

by statefip provider;

if first.provider=0 or last.provider=0;

run;

Ask a Question
Discussion stats
  • 3 replies
  • 142 views
  • 1 like
  • 2 in conversation