BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Joachim133
Fluorite | Level 6

I have searched everywhere, but I don't seem to find the answer. I think my problem is simple, but I can't get it right. My problem:

In a dataset, I have patients having multiple diagnoses and duplicate diagnoses. I need to delete these duplicate diagnoses per patient, meaning another patient having the same diagnose should not be deleted. 

Example of dataset (not real data):

Patient_IDDiagnose 
11Keep observation
11Delete observation
12Keep observation
13Keep observation
21Keep observation
21Delete observation
23Keep observation
23Delete observation
31Keep observation
32Keep observation
35Keep observation
36Keep observation
41Keep observation
41Delete observation
43Keep observation
43Delete observation

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Is your actual data sorted by Patiend_ID and Diagnose?

 

If so:

 

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

 

Result:

 

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3 

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Is your actual data sorted by Patiend_ID and Diagnose?

 

If so:

 

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

 

Result:

 

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3 
novinosrin
Tourmaline | Level 20

simple sort and nodupkey?

 

proc sort data=have out=want nodupkey;
 by patient_id diagnose;
run;

 

If your dataset had been already ordered as shown in sample then-

data want;
 set have;
 by  patient_id diagnose;
 if first.diagnose;
run;

 

Joachim133
Fluorite | Level 6

Thank you both! My data is sorted as in the example dataset, so and your solution works perfect.

Regards

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 658 views
  • 2 likes
  • 3 in conversation