Help using Base SAS procedures

Removing Duplicates

Reply
Regular Contributor
Posts: 199

Removing Duplicates

I am analyzing survey data. I want to remove respondents that completed the survey more than once. A five-digit code was generated that enabled repeat responders to be identified (Variable name = privatecode).

 

I see various ways of using PROC SORT to remove duplicates and leave one observation (i.e. NODUPRECS, NODUPKEY). I want to remove ALL observations with the same private code. 

 

How can I accomplish this?

 

Thanks!

Trusted Advisor
Posts: 1,405

Re: Removing Duplicates

Do you mean that you want to filter any privatecode if occures more than once ?

 

If so do:

proc sort data=have; by privatecode; run;

 

data want;

 set have;

   by privatecode;

       if not (first.privatecode and last.privatecode) then delete;

run;

 

Regular Contributor
Posts: 199

Re: Removing Duplicates

If by "filter" you mean remove, then yes!

Regular Contributor
Posts: 199

Re: Removing Duplicates

@Shmuel 

 

There is one problem. Almost 400 observations have privatecode missing. These observations are being deleted because the privatecode is the same, missing.

 

What's the best work-around here?

 

Thanks!

Super User
Posts: 10,550

Re: Removing Duplicates

  if not (first.privatecode and last.privatecode) and not missing(privatecode)  then delete;

 

 

Trusted Advisor
Posts: 1,405

Re: Removing Duplicates

@ballardw gave you the right answer how do include all observations with miising privatecode.

 

@ballardw - is it possible to be done with the SQL too ?

Super User
Posts: 10,550

Re: Removing Duplicates


Shmuel wrote:

 

@ballardw - is it possible to be done with the SQL too ?


I'm not going to recreate a dataset to test since the OP didn't provide example data (hint)

 

I would start by changing

having count(*)=1

to

having count(*)=1 or missing(a.privatecode)

this is untested.

Trusted Advisor
Posts: 1,405

Re: Removing Duplicates

In order to include all observations with missing privatecode, change my code to:

 

 

data want;
 set have;
   by privatecode;
if not (first.privatecode and last.privatecode)
and not missing(privatecode) then delete; run;

 

Super User
Posts: 10,550

Re: Removing Duplicates

Identify the singles and keep. Proc sql will do this:

 

proc sql;
   create table want as
   select b.*
   from (
         select distinct privatecode
         from YOURDATESET
         group by privatecode
         having count(*)=1
        ) as a 
     left join YOURDATASET as b
     on a.private=b.privatecode;
quit;
Ask a Question
Discussion stats
  • 8 replies
  • 189 views
  • 1 like
  • 3 in conversation