Solved: Re: Finding duplicates

sks521 · Posted 12-20-2019 05:21 AM

Hi folks,

I have a data set on emergency healthcare use by children so one id can have multiple admission dates to emergency wards. My problem is same id with same admission date more than once.

So, I want to find out duplicates based on two variables; 'id' and 'admissiondate' and create a data set without duplicates but before I create a data set without duplicates I need to know how many duplicate entries I have in the data set for verification purposes.

Thanks

S

ed_sas_member · Posted 12-20-2019 05:25 AM

Hi @sks521

You can use a PROC SORT to do that with the NODUPKEY option

The DUPOUT = option will output a dataset 'want' with the duplicate records base on the key 'id admissiondate' specified in the BY statement.

The OUT = option will output a dataset 'duplicates' with no duplicate records

proc sort data=have out = want dupout = duplicates nodupkey;
	by id admissiondate;
run;

View solution in original post

ed_sas_member · Posted 12-20-2019 05:25 AM

Hi @sks521

You can use a PROC SORT to do that with the NODUPKEY option

The DUPOUT = option will output a dataset 'want' with the duplicate records base on the key 'id admissiondate' specified in the BY statement.

The OUT = option will output a dataset 'duplicates' with no duplicate records

proc sort data=have out = want dupout = duplicates nodupkey;
	by id admissiondate;
run;

rohitdante16 · Posted 12-20-2019 05:55 AM

Best Method now is:

proc sort data=ds1 out =ds2 dupout = dup nodupkey;
by id admissiondate;
run;

s_lassen · Posted 12-20-2019 08:45 AM

Sort your data by ID and AdmissionDate, and then use a data step:

data want;
  do NDups=0 by 1 until(last.AdmissionDate);
    set have;
    by ID AdmissionDate;
    end;
run;

The NDups variable should the contain the number of duplicates for each by group.

ballardw · Posted 12-20-2019 10:42 AM

@sks521 wrote:

Hi folks,

I have a data set on emergency healthcare use by children so one id can have multiple admission dates to emergency wards. My problem is same id with same admission date more than once.

So, I want to find out duplicates based on two variables; 'id' and 'admissiondate' and create a data set without duplicates but before I create a data set without duplicates I need to know how many duplicate entries I have in the data set for verification purposes.

Thanks

S

This will create a data set that has the id and admiisiondate with a count of how many duplicates when there are duplicates.

proc freq data=have noprint;
   tables id*admissiondate /out=work.counts (drop=percent where=(count>1)) ;
run;

Finding duplicates

Re: Finding duplicates

Re: Finding duplicates

Re: Finding duplicates

Re: Finding duplicates

Re: Finding duplicates

Click image to register for webinar

Classroom Training Available!