Hello.
I have a dataset of patients having multiple hospital visits. Patients are included in my study, when they have a specific diagnose. I need to keep all visits and diagnoses prior to the inclusion date, and deleting all visits after the inclusion date.
Example of dataset (not real data). Inclusion when a patient have diagnose code '200':
Patient_ID | visit_ID | Date | Diagnose | |
1 | 1 | 01.01.20 | 111 | Keep observation |
1 | 2 | 02.01.20 | 154 | keep observation |
1 | 3 | 03.01.20 | 200 | keep observation |
1 | 4 | 04.01.20 | 085 | delete observation |
2 | 5 | 01.06.20 | 119 | keep observation |
2 | 6 | 02.06.20 | 543 | keep observation |
2 | 7 | 03.06.20 | 200 | keep observation |
3 | 8 | 01.09.19 | 746 | keep observation |
3 | 9 | 02.09.19 | 549 | keep observation |
3 | 10 | 03.09.19 | 200 | keep observation |
3 | 11 | 04.09.19 | 432 | delete observation |
3 | 12 | 05.09.19 | 543 | delete observation |
How can this be done?
I use SAS 9.4.
Thanks!
data have;
infile datalines dlm="09"x;
input Patient_ID visit_ID Date:mmddyy10. Diagnose;
format date mmddyys10.;
datalines;
1 1 01.01.20 111 Keep observation
1 2 02.01.20 154 keep observation
1 3 03.01.20 200 keep observation
1 4 04.01.20 085 delete observation
2 5 01.06.20 119 keep observation
2 6 02.06.20 543 keep observation
2 7 03.06.20 200 keep observation
3 8 01.09.19 746 keep observation
3 9 02.09.19 549 keep observation
3 10 03.09.19 200 keep observation
3 11 04.09.19 432 delete observation
3 12 05.09.19 543 delete observation
;
run;
data want;
do until(last.patient_id or diagnose=200);
set have;
by patient_id;
if first.patient_id then _N_=0;
if _N_=0 then output;
end;
run;
Retain a flag variable:
data have;
input Patient_ID visit_ID Date :ddmmyy8. Diagnose :$3.;
format date yymmdd10.;
datalines;
1 1 01.01.20 111
1 2 02.01.20 154
1 3 03.01.20 200
1 4 04.01.20 085
2 5 01.06.20 119
2 6 02.06.20 543
2 7 03.06.20 200
3 8 01.09.19 746
3 9 02.09.19 549
3 10 03.09.19 200
3 11 04.09.19 432
3 12 05.09.19 543
;
data want;
set have;
by patient_id;
retain flag;
if first.patient_id then flag = 1;
if flag;
if diagnose = "200" then flag = 0;
drop flag;
run;
Thank you!
However, I forgot to mention, that the including diagnose '200' may come several times per patient_ID. The cutoff should be the first time, the patient is admitted with the diagnose '200'. Does the code also work like that?
@Joachim133 wrote:
Thank you!
However, I forgot to mention, that the including diagnose '200' may come several times per patient_ID. The cutoff should be the first time, the patient is admitted with the diagnose '200'. Does the code also work like that?
Yes. Once the flag has been set to zero, it will stay that way until the next patient_id comes up.
data have;
infile datalines dlm="09"x;
input Patient_ID visit_ID Date:mmddyy10. Diagnose;
format date mmddyys10.;
datalines;
1 1 01.01.20 111 Keep observation
1 2 02.01.20 154 keep observation
1 3 03.01.20 200 keep observation
1 4 04.01.20 085 delete observation
2 5 01.06.20 119 keep observation
2 6 02.06.20 543 keep observation
2 7 03.06.20 200 keep observation
3 8 01.09.19 746 keep observation
3 9 02.09.19 549 keep observation
3 10 03.09.19 200 keep observation
3 11 04.09.19 432 delete observation
3 12 05.09.19 543 delete observation
;
run;
data want;
do until(last.patient_id or diagnose=200);
set have;
by patient_id;
if first.patient_id then _N_=0;
if _N_=0 then output;
end;
run;
data have;
input Patient_ID visit_ID Date :ddmmyy8. Diagnose :$3.;
format date yymmdd10.;
datalines;
1 1 01.01.20 111
1 2 02.01.20 154
1 3 03.01.20 200
1 4 04.01.20 085
2 5 01.06.20 119
2 6 02.06.20 543
2 7 03.06.20 200
3 8 01.09.19 746
3 9 02.09.19 549
3 10 03.09.19 200
3 11 04.09.19 432
3 12 05.09.19 543
;
data want;
do _n_=0 by 0 until(last.patient_id);
set have;
by patient_id;
if not _n_ then output;
if Diagnose='200' then _n_=constant('e');
end;
run;
Hi @Joachim133 @Kurt_Bremser 's solution flag cut is elegance personified, i wanted to have some fun
data have;
input Patient_ID visit_ID Date :ddmmyy8. Diagnose :$3.;
format date yymmdd10.;
datalines;
1 1 01.01.20 111
1 2 02.01.20 154
1 3 03.01.20 200
1 4 04.01.20 085
2 5 01.06.20 119
2 6 02.06.20 543
2 7 03.06.20 200
3 8 01.09.19 746
3 9 02.09.19 549
3 10 03.09.19 200
3 11 04.09.19 432
3 12 05.09.19 543
;
Proc sql;
create table want as
select *
from have
group by patient_id
having date<=min(ifn(diagnose='200',date,.))
order by patient_id,visit_id;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.