Hi I have a dataset that looks something like this:
data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 . 08/10/2016 01
1 08/10/2016 . 08/18/2016 09
1 08/10/2016 . 10/08/2016 04
1 08/10/2016 . 11/25/2016 06
1 08/10/2016 . 12/06/2016 08
2 01/05/2017 . 01/05/2017 01
2 01/05/2017 . 01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 . 02/15/2017 01
3 02/15/2017 . 02/25/2017 07
4 03/14/2017 . 03/14/2017 01
4 03/14/2017 . 03/18/2017 04
4 03/14/2017 . 03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
;
In my data set, I want to create a start and stop indicator using values of assmt_rsn. If assmt_rsn takes the values '06', '07', '08' or 09 then I want to stop at that line and count that date as the end of stay within that episode. Every stop of episode , should have a start denoted by assmt_rsn taking the value 01 or 03. In some cases when a 01 or 03 is missing before the stop, a 04 or 05 is considered as a start of stay. So, I finally want to achieve something like this:
ep_id ep_stay_id SOC ROC ep_stay_strt_dt ep_stay_start_rsn ep_stay_end_dt ep_stay_end_rsn
1 1 08/10/2016 . 08/10/2016 01 08/18/2016 09
1 2 08/10/2016 . 10/08/2016 04 11/25/2016 06
1 3 08/10/2016 . . . 12/06/2016 08
2 1 01/05/2017 . 01/05/2017 01 01/15/2017 06
2 2 01/05/2017 01/19/2017 01/19/2017 03 01/30/2017 09
3 1 02/15/2017 . 02/15/2017 01 02/25/2017 07
4 1 03/14/2017 . 03/14/2017 01 03/26/2017 06
4 2 03/14/2017 04/02/2017 04/02/2017 03 04/22/2017 06
4 3 03/14/2017 . . . 04/28/2017 08
How can I achieve this?
You can use a sequence variable to help yourself:
data mid1;
set test;
by EP_ID;
if first.EP_ID then seq=1;
output;
if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;
data want;
set mid1;
by EP_ID seq;
retain ep_stay_strt_dt ep_stay_start_rsn;
if first.seq then do;
ep_stay_strt_dt=ASSMT_DT;
ep_stay_start_rsn=ASSMT_RSN;
end;
if last.seq then do;
if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
ep_stay_end_dt=ASSMT_DT;
ep_stay_end_rsn=ASSMT_RSN;
output;
end;
format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;
You can use a sequence variable to help yourself:
data mid1;
set test;
by EP_ID;
if first.EP_ID then seq=1;
output;
if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;
data want;
set mid1;
by EP_ID seq;
retain ep_stay_strt_dt ep_stay_start_rsn;
if first.seq then do;
ep_stay_strt_dt=ASSMT_DT;
ep_stay_start_rsn=ASSMT_RSN;
end;
if last.seq then do;
if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
ep_stay_end_dt=ASSMT_DT;
ep_stay_end_rsn=ASSMT_RSN;
output;
end;
format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;
Thank you, your code worked very well!
I took me a while to grasp what you want:
data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $2.;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 . 08/10/2016 01
1 08/10/2016 . 08/18/2016 09
1 08/10/2016 . 10/08/2016 04
1 08/10/2016 . 11/25/2016 06
1 08/10/2016 . 12/06/2016 08
2 01/05/2017 . 01/05/2017 01
2 01/05/2017 . 01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 . 02/15/2017 01
3 02/15/2017 . 02/25/2017 07
4 03/14/2017 . 03/14/2017 01
4 03/14/2017 . 03/18/2017 04
4 03/14/2017 . 03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
run;
data episode (drop=i assmt_dt assmt_rsn);
do i=1 by 1 until (assmt_rsn in ('06','07','08','09') or last.ep_id=1);
set test;
by ep_id;
if first.ep_id then ep_stay_id=1;
else if i=1 then ep_stay_id+1;
if i=1 and not (assmt_rsn in ('06','07','08','09')) then do;
ep_stay_start_dt=assmt_dt;
ep_stay_start_rsn=assmt_rsn;
end;
end;
ep_stay_end_dt=assmt_dt ;
ep_stay_end_rsn=assmt_rsn;
format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
ep_stay_end_dt mmddyy10. ep_stay_end_rsn $2. ;
run;
In your last observation in the desired output, you have ROC as missing, but I think your logic requires it to be 4/2/2017.
This can also be done without embedding the SET statement in a DO loop:
data episode (drop=assmt_:);
set test;
by ep_id;
if first.ep_id then ep_stay_id=.;
format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
ep_stay_end_dt mmddyy10. ep_stay_end_rsn $2. ;
retain ep_stay_start: ;
if first.ep_id=1 or lag(assmt_rsn) in ('06','07','08','09') then do;
ep_stay_id+1;
ep_stay_start_dt=assmt_dt;
ep_stay_start_rsn=assmt_rsn;
if assmt_rsn in ('06','07','08','09') then call missing(of ep_stay_start:);
end;
if (assmt_rsn in ('06','07','08','09') or last.ep_id);
ep_stay_end_dt=assmt_dt ;
ep_stay_end_rsn=assmt_rsn;
run;
My apologies if my question was unclear.
Thank you, both your code suggestions worked very well. And you are correct, I made a mistake. In my desired output the ROC on the last observation should be 4/2/2017.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.