BookmarkSubscribeRSS Feed
DanC
Calcite | Level 5

Hi Everyone

 

I am trying to keep the full sample of one file after merge. However, I tried many times but still got higher number. Could anyone give me some suggestions? Here is my code:

 

DATA ANALYTIC (replace=yes);

MERGE ADM_forHYBRID (in=x)  MFILE (in=y);
BY STATEFIP PROVIDER ;
if x=1 then output;
RUN;

 

The log shows that the ADM file has 20804087 observations and the MFILE has 177125 observations. However, after merge, the file has 20804107 observations. I want the final file to be 20804087 observations but I cannot get that number. Could anyone help me solve the issue? 

Thanks a lot!

3 REPLIES 3
Astounding
PROC Star

This is a sign that MFILE occasionally contains more than one observation for a combination of STATEFIP / PROVIDER.

 

There are ways to deal with this, but first you have to decide ... what should happen when there is a one-to-many match?

DanC
Calcite | Level 5

Thank you Astounding.

 

I delete duplicate observations in the MFILE by STATEFIP and PROVIDER. But still get the same number of observations after merge the two datasets. Do you have any suggestion?

Astounding
PROC Star

The issue will remain the same.  Perhaps there are observations in MFILE with the same STATEFIP + PROVIDER, but having a different value for some other variable.  You could definitely eliminate duplicates with:

 

proc sort data=mfile out=deduped nodupkey;

   by statefip provider;

run;

 

While that gives you the proper number of observations, it will not tell you which observations were selected and which were deleted.  You might just want to examine where the duplicates come from:

 

data small_subset;

set mfile;

by statefip provider;

if first.provider=0 or last.provider=0;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 827 views
  • 1 like
  • 2 in conversation