I have a data set with many duplicates for each observation, some duplicates have the same test results while other duplicates have different test results.
In the scenario for person2, I want to keep the duplicate with a positive test result over the unknown and for person1, I want to keep just one of the results since they are the same. Person3, I'd like to keep the unknown duplicate over the missing. I've already ordered the test results for positive =1 unk=2, missing =3.
How can I code it to drop certain duplicates based on the test result status?
As long as you sort the data into the order that you want to keep the results, you can use a PROC SORT NODUPKEY to get the results
Here's an example
data have ; infile cards ; input person $ result $ ; if result="Positive" then sortOrder="1" ; else if result="Unknown" then sortOrder="2" ; else if result="Negative" then sortOrder="3" ; output have ; cards ; Person1 Negative Person1 Positive Person1 Unknown Person2 Negative Person2 Unknown Person3 Negative run ; /* First sort the records into the correct order */ proc sort data=have out=sort1 ; by person sortOrder; run ; /* Now remove the duplicates */ proc sort nodupkey data=sort1 out=want ; by person ; run ;
My 2 cents
data have; input person $ result $; datalines; Person1 Positive Person1 Positive Person2 Unknown Person2 Positive Person3 Unknown Person3 Missing ; proc sql; create table want as select distinct * from have group by person having whichc(result, 'Positive', 'Unknown', 'Missing') = min(whichc(result, 'Positive', 'Unknown', 'Missing')) ; quit;
person result Person1 Positive Person2 Positive Person3 Unknown
I think it may be a good idea to check for unexpected values/errors.
Given data like this
data have; input person $ result $; datalines; Person1 Positive Person1 Positive Person2 Unknown Person2 Positive Person3 Unknown Person3 Missing Person4 Gylle ;run;
(note that I put in a not predicted value "Gylle" in the last row)
One way to go about it could be this:
data want; array values(3) $8 _temporary_ ('Positive','Unknown','Missing'); do until(last.person); set have; by person; _idx=min(_idx,whichc(result,of values(*))); if _idx=0 then error 'Unexpected result value: ' result; end; if _idx>0 then result=values(_idx); else delete; drop _idx; run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.