Hi I have a dataset that looks something like this:
data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 . 08/10/2016 01
1 08/10/2016 . 08/18/2016 09
1 08/10/2016 . 10/08/2016 04
1 08/10/2016 . 11/25/2016 06
1 08/10/2016 . 12/06/2016 08
2 01/05/2017 . 01/05/2017 01
2 01/05/2017 . 01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 . 02/15/2017 01
3 02/15/2017 . 02/25/2017 07
4 03/14/2017 . 03/14/2017 01
4 03/14/2017 . 03/18/2017 04
4 03/14/2017 . 03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
;
In my data set, I want to create a start and stop indicator using values of assmt_rsn. If assmt_rsn takes the values '06', '07', '08' or 09 then I want to stop at that line and count that date as the end of stay within that episode. Every stop of episode , should have a start denoted by assmt_rsn taking the value 01 or 03. In some cases when a 01 or 03 is missing before the stop, a 04 or 05 is considered as a start of stay. So, I finally want to achieve something like this:
ep_id ep_stay_id SOC ROC ep_stay_strt_dt ep_stay_start_rsn ep_stay_end_dt ep_stay_end_rsn
1 1 08/10/2016 . 08/10/2016 01 08/18/2016 09
1 2 08/10/2016 . 10/08/2016 04 11/25/2016 06
1 3 08/10/2016 . . . 12/06/2016 08
2 1 01/05/2017 . 01/05/2017 01 01/15/2017 06
2 2 01/05/2017 01/19/2017 01/19/2017 03 01/30/2017 09
3 1 02/15/2017 . 02/15/2017 01 02/25/2017 07
4 1 03/14/2017 . 03/14/2017 01 03/26/2017 06
4 2 03/14/2017 04/02/2017 04/02/2017 03 04/22/2017 06
4 3 03/14/2017 . . . 04/28/2017 08
How can I achieve this?
You can use a sequence variable to help yourself:
data mid1;
set test;
by EP_ID;
if first.EP_ID then seq=1;
output;
if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;
data want;
set mid1;
by EP_ID seq;
retain ep_stay_strt_dt ep_stay_start_rsn;
if first.seq then do;
ep_stay_strt_dt=ASSMT_DT;
ep_stay_start_rsn=ASSMT_RSN;
end;
if last.seq then do;
if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
ep_stay_end_dt=ASSMT_DT;
ep_stay_end_rsn=ASSMT_RSN;
output;
end;
format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;
You can use a sequence variable to help yourself:
data mid1;
set test;
by EP_ID;
if first.EP_ID then seq=1;
output;
if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;
data want;
set mid1;
by EP_ID seq;
retain ep_stay_strt_dt ep_stay_start_rsn;
if first.seq then do;
ep_stay_strt_dt=ASSMT_DT;
ep_stay_start_rsn=ASSMT_RSN;
end;
if last.seq then do;
if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
ep_stay_end_dt=ASSMT_DT;
ep_stay_end_rsn=ASSMT_RSN;
output;
end;
format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;
Thank you, your code worked very well!
I took me a while to grasp what you want:
data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $2.;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 . 08/10/2016 01
1 08/10/2016 . 08/18/2016 09
1 08/10/2016 . 10/08/2016 04
1 08/10/2016 . 11/25/2016 06
1 08/10/2016 . 12/06/2016 08
2 01/05/2017 . 01/05/2017 01
2 01/05/2017 . 01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 . 02/15/2017 01
3 02/15/2017 . 02/25/2017 07
4 03/14/2017 . 03/14/2017 01
4 03/14/2017 . 03/18/2017 04
4 03/14/2017 . 03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
run;
data episode (drop=i assmt_dt assmt_rsn);
do i=1 by 1 until (assmt_rsn in ('06','07','08','09') or last.ep_id=1);
set test;
by ep_id;
if first.ep_id then ep_stay_id=1;
else if i=1 then ep_stay_id+1;
if i=1 and not (assmt_rsn in ('06','07','08','09')) then do;
ep_stay_start_dt=assmt_dt;
ep_stay_start_rsn=assmt_rsn;
end;
end;
ep_stay_end_dt=assmt_dt ;
ep_stay_end_rsn=assmt_rsn;
format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
ep_stay_end_dt mmddyy10. ep_stay_end_rsn $2. ;
run;
In your last observation in the desired output, you have ROC as missing, but I think your logic requires it to be 4/2/2017.
This can also be done without embedding the SET statement in a DO loop:
data episode (drop=assmt_:);
set test;
by ep_id;
if first.ep_id then ep_stay_id=.;
format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
ep_stay_end_dt mmddyy10. ep_stay_end_rsn $2. ;
retain ep_stay_start: ;
if first.ep_id=1 or lag(assmt_rsn) in ('06','07','08','09') then do;
ep_stay_id+1;
ep_stay_start_dt=assmt_dt;
ep_stay_start_rsn=assmt_rsn;
if assmt_rsn in ('06','07','08','09') then call missing(of ep_stay_start:);
end;
if (assmt_rsn in ('06','07','08','09') or last.ep_id);
ep_stay_end_dt=assmt_dt ;
ep_stay_end_rsn=assmt_rsn;
run;
My apologies if my question was unclear.
Thank you, both your code suggestions worked very well. And you are correct, I made a mistake. In my desired output the ROC on the last observation should be 4/2/2017.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.