Solved: Re: Delete duplicates by group

Joachim133 · Posted 11-04-2020 07:35 AM

I have searched everywhere, but I don't seem to find the answer. I think my problem is simple, but I can't get it right. My problem:

In a dataset, I have patients having multiple diagnoses and duplicate diagnoses. I need to delete these duplicate diagnoses per patient, meaning another patient having the same diagnose should not be deleted.

Example of dataset (not real data):

Patient_ID	Diagnose
1	1	Keep observation
1	1	Delete observation
1	2	Keep observation
1	3	Keep observation
2	1	Keep observation
2	1	Delete observation
2	3	Keep observation
2	3	Delete observation
3	1	Keep observation
3	2	Keep observation
3	5	Keep observation
3	6	Keep observation
4	1	Keep observation
4	1	Delete observation
4	3	Keep observation
4	3	Delete observation

Thanks in advance!

PeterClemmensen · Posted 11-04-2020 07:37 AM

Is your actual data sorted by Patiend_ID and Diagnose?

If so:

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

Result:

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 11-04-2020 07:37 AM

Is your actual data sorted by Patiend_ID and Diagnose?

If so:

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

Result:

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 11-04-2020 07:37 AM

simple sort and nodupkey?

proc sort data=have out=want nodupkey;
 by patient_id diagnose;
run;

If your dataset had been already ordered as shown in sample then-

data want;
 set have;
 by  patient_id diagnose;
 if first.diagnose;
run;

Joachim133 · Posted 11-04-2020 07:55 AM

Thank you both! My data is sorted as in the example dataset, so and your solution works perfect.

Regards

Delete duplicates by group

Re: Delete duplicates by group

Re: Delete duplicates by group

Re: Delete duplicates by group

Re: Delete duplicates by group

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away