06-06-2017 03:49 PM
I have 121 subjects in total and they have questionnaire data in 3 time points (baseline, 6 month and 12 month). I found subject ID = 2031 has two baseline data, and I want to delete the one has age, race, education level missing value. Could you please help me with the code? Thank you!
06-06-2017 04:02 PM
if first.subjectid and last subjectid then output; else
if age=. or race=. or highestgrade=. then delete;
06-06-2017 04:35 PM
data have ; input id visit $ val1 val2 val3; datalines; 1 Baseline . . . 1 Baseline 2 3 4 1 faseline . . . 2 Baseline . . . ; proc sql; delete from have a where exists (select * from have b where a.id =b.id and a.visit='Baseline' group by id having count(id)> 1) and a.val1 = . and a.val2 =. and a.val3 = .;
06-06-2017 04:38 PM
Schmuel's approach would work, but might require some changes (depending on what the data looks like on the 6 month and 12 month visits). Assuming you have sorted your data set by SubjectID Visit DESCENDING Age:
by subjectID Visit descending Age;
If you are concerned about more than one duplicate, we might have to revisit how to subset properly.
06-07-2017 05:02 PM
Assuming, for each subject id, there is always exactly one record with non-missing age,race,highestgrade, then a single WHERE statement will work, no matter how many record is one or more of those variables missing: