🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 11-04-2020 07:35 AM
(907 views)
I have searched everywhere, but I don't seem to find the answer. I think my problem is simple, but I can't get it right. My problem:
In a dataset, I have patients having multiple diagnoses and duplicate diagnoses. I need to delete these duplicate diagnoses per patient, meaning another patient having the same diagnose should not be deleted.
Example of dataset (not real data):
Patient_ID | Diagnose | |
1 | 1 | Keep observation |
1 | 1 | Delete observation |
1 | 2 | Keep observation |
1 | 3 | Keep observation |
2 | 1 | Keep observation |
2 | 1 | Delete observation |
2 | 3 | Keep observation |
2 | 3 | Delete observation |
3 | 1 | Keep observation |
3 | 2 | Keep observation |
3 | 5 | Keep observation |
3 | 6 | Keep observation |
4 | 1 | Keep observation |
4 | 1 | Delete observation |
4 | 3 | Keep observation |
4 | 3 | Delete observation |
Thanks in advance!
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is your actual data sorted by Patiend_ID and Diagnose?
If so:
data have;
input Patient_ID Diagnose;
datalines;
1 1
1 1
1 2
1 3
2 1
2 1
2 3
2 3
3 1
3 2
3 5
3 6
4 1
4 1
4 3
4 3
;
data want;
set have;
by Patient_ID Diagnose;
if first.Diagnose;
run;
Result:
Patient_ID Diagnose 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 5 3 6 4 1 4 3
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is your actual data sorted by Patiend_ID and Diagnose?
If so:
data have;
input Patient_ID Diagnose;
datalines;
1 1
1 1
1 2
1 3
2 1
2 1
2 3
2 3
3 1
3 2
3 5
3 6
4 1
4 1
4 3
4 3
;
data want;
set have;
by Patient_ID Diagnose;
if first.Diagnose;
run;
Result:
Patient_ID Diagnose 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 5 3 6 4 1 4 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
simple sort and nodupkey?
proc sort data=have out=want nodupkey;
by patient_id diagnose;
run;
If your dataset had been already ordered as shown in sample then-
data want;
set have;
by patient_id diagnose;
if first.diagnose;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both! My data is sorted as in the example dataset, so and your solution works perfect.
Regards