01-10-2017 03:46 PM
I am analyzing survey data. I want to remove respondents that completed the survey more than once. A five-digit code was generated that enabled repeat responders to be identified (Variable name = privatecode).
I see various ways of using PROC SORT to remove duplicates and leave one observation (i.e. NODUPRECS, NODUPKEY). I want to remove ALL observations with the same private code.
How can I accomplish this?
01-10-2017 03:56 PM
Do you mean that you want to filter any privatecode if occures more than once ?
If so do:
proc sort data=have; by privatecode; run;
if not (first.privatecode and last.privatecode) then delete;
01-10-2017 06:57 PM
There is one problem. Almost 400 observations have privatecode missing. These observations are being deleted because the privatecode is the same, missing.
What's the best work-around here?
01-11-2017 10:32 AM
@ballardw - is it possible to be done with the SQL too ?
I'm not going to recreate a dataset to test since the OP didn't provide example data (hint)
I would start by changing
having count(*)=1 or missing(a.privatecode)
this is untested.
01-11-2017 01:07 PM
In order to include all observations with missing privatecode, change my code to:
data want; set have; by privatecode;
if not (first.privatecode and last.privatecode)
and not missing(privatecode) then delete; run;
01-10-2017 04:00 PM
Identify the singles and keep. Proc sql will do this:
proc sql; create table want as select b.* from ( select distinct privatecode from YOURDATESET group by privatecode having count(*)=1 ) as a left join YOURDATASET as b on a.private=b.privatecode; quit;