BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Joachim133
Fluorite | Level 6

I have searched everywhere, but I don't seem to find the answer. I think my problem is simple, but I can't get it right. My problem:

In a dataset, I have patients having multiple diagnoses and duplicate diagnoses. I need to delete these duplicate diagnoses per patient, meaning another patient having the same diagnose should not be deleted. 

Example of dataset (not real data):

Patient_IDDiagnose 
11Keep observation
11Delete observation
12Keep observation
13Keep observation
21Keep observation
21Delete observation
23Keep observation
23Delete observation
31Keep observation
32Keep observation
35Keep observation
36Keep observation
41Keep observation
41Delete observation
43Keep observation
43Delete observation

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Is your actual data sorted by Patiend_ID and Diagnose?

 

If so:

 

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

 

Result:

 

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3 

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Is your actual data sorted by Patiend_ID and Diagnose?

 

If so:

 

data have;
input Patient_ID Diagnose;
datalines;
1 1 
1 1 
1 2 
1 3 
2 1 
2 1 
2 3 
2 3 
3 1 
3 2 
3 5 
3 6 
4 1 
4 1 
4 3 
4 3 
;

data want;
   set have;
   by Patient_ID Diagnose;
   if first.Diagnose;
run;

 

Result:

 

Patient_ID Diagnose 
1          1 
1          2 
1          3 
2          1 
2          3 
3          1 
3          2 
3          5 
3          6 
4          1 
4          3 
novinosrin
Tourmaline | Level 20

simple sort and nodupkey?

 

proc sort data=have out=want nodupkey;
 by patient_id diagnose;
run;

 

If your dataset had been already ordered as shown in sample then-

data want;
 set have;
 by  patient_id diagnose;
 if first.diagnose;
run;

 

Joachim133
Fluorite | Level 6

Thank you both! My data is sorted as in the example dataset, so and your solution works perfect.

Regards

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 623 views
  • 2 likes
  • 3 in conversation