BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
iroy
Calcite | Level 5

Hi I have a dataset that looks something like this:

data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 . 08/10/2016 01
1 08/10/2016 . 08/18/2016 09
1 08/10/2016 . 10/08/2016 04
1 08/10/2016 . 11/25/2016 06
1 08/10/2016 . 12/06/2016 08
2 01/05/2017 . 01/05/2017 01
2 01/05/2017 . 01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 . 02/15/2017 01
3 02/15/2017 . 02/25/2017 07
4 03/14/2017 . 03/14/2017 01
4 03/14/2017 . 03/18/2017 04
4 03/14/2017 . 03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
;

 

In my data set, I want to create a start and stop indicator using values of assmt_rsn. If assmt_rsn takes the values '06', '07', '08' or 09 then I want to stop at that line and count that date as the end of stay within that episode. Every stop of episode , should have a start denoted by assmt_rsn taking the value 01 or 03. In some cases when a 01 or 03 is missing before the stop, a 04 or 05 is considered as a start of stay. So, I finally want to achieve something like this:

 

ep_id  ep_stay_id        SOC             ROC           ep_stay_strt_dt   ep_stay_start_rsn   ep_stay_end_dt   ep_stay_end_rsn

1                  1             08/10/2016         .                 08/10/2016                     01                     08/18/2016                       09

1                  2            08/10/2016         .                 10/08/2016                     04                     11/25/2016                        06 

1                  3            08/10/2016         .                           .                                  .                       12/06/2016                       08

2                 1             01/05/2017         .                  01/05/2017                     01                      01/15/2017                        06

2                 2            01/05/2017    01/19/2017    01/19/2017                     03                     01/30/2017                        09

3                 1             02/15/2017          .                 02/15/2017                      01                     02/25/2017                        07

4                 1             03/14/2017          .                 03/14/2017                      01                     03/26/2017                       06

4                 2            03/14/2017   04/02/2017   04/02/2017                    03                    04/22/2017                       06

4                 3             03/14/2017          .                          .                                    .                      04/28/2017                       08

 

How can I achieve this?

1 ACCEPTED SOLUTION

Accepted Solutions
whymath
Lapis Lazuli | Level 10

You can use a sequence variable to help yourself:

data mid1;
  set test;
  by EP_ID;

  if first.EP_ID then seq=1;
  output;
  if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;

data want;
  set mid1;
  by EP_ID seq;

  retain ep_stay_strt_dt ep_stay_start_rsn;
  if first.seq then do;
    ep_stay_strt_dt=ASSMT_DT;
    ep_stay_start_rsn=ASSMT_RSN;
  end;
  if last.seq then do;
    if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
    ep_stay_end_dt=ASSMT_DT;
    ep_stay_end_rsn=ASSMT_RSN;
    output;
  end;
  format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;

View solution in original post

5 REPLIES 5
whymath
Lapis Lazuli | Level 10

You can use a sequence variable to help yourself:

data mid1;
  set test;
  by EP_ID;

  if first.EP_ID then seq=1;
  output;
  if first.EP_ID=0 and ASSMT_RSN in('06','07','08','09') then seq+1;
run;

data want;
  set mid1;
  by EP_ID seq;

  retain ep_stay_strt_dt ep_stay_start_rsn;
  if first.seq then do;
    ep_stay_strt_dt=ASSMT_DT;
    ep_stay_start_rsn=ASSMT_RSN;
  end;
  if last.seq then do;
    if first.seq then call missing(ep_stay_strt_dt,ep_stay_start_rsn);
    ep_stay_end_dt=ASSMT_DT;
    ep_stay_end_rsn=ASSMT_RSN;
    output;
  end;
  format ep_stay_end_dt ep_stay_strt_dt mmddyys10.;
run;
iroy
Calcite | Level 5

Thank you, your code worked very well!

mkeintz
PROC Star

I took me a while to grasp what you want:

 

data test;
input EP_ID $ SOC :mmddyy10. ROC :mmddyy10. ASSMT_DT :mmddyy10. ASSMT_RSN $2.;
format SOC ROC ASSMT_DT date9.;
datalines;
1 08/10/2016 .          08/10/2016 01
1 08/10/2016 .          08/18/2016 09
1 08/10/2016 .          10/08/2016 04
1 08/10/2016 .          11/25/2016 06
1 08/10/2016 .          12/06/2016 08
2 01/05/2017 .          01/05/2017 01
2 01/05/2017 .          01/15/2017 06
2 01/05/2017 01/19/2017 01/19/2017 03
2 01/05/2017 01/19/2017 01/30/2017 09
3 02/15/2017 .          02/15/2017 01
3 02/15/2017 .          02/25/2017 07
4 03/14/2017 .          03/14/2017 01
4 03/14/2017 .          03/18/2017 04
4 03/14/2017 .          03/26/2017 06
4 03/14/2017 04/02/2017 04/02/2017 03
4 03/14/2017 04/02/2017 04/12/2017 05
4 03/14/2017 04/02/2017 04/22/2017 06
4 03/14/2017 04/02/2017 04/28/2017 08
run;

data episode (drop=i assmt_dt assmt_rsn);
  do i=1 by 1 until (assmt_rsn in ('06','07','08','09') or last.ep_id=1);
    set test;
    by ep_id;
    if first.ep_id then ep_stay_id=1;
    else if i=1 then ep_stay_id+1;

    if i=1 and not (assmt_rsn in ('06','07','08','09')) then do;
      ep_stay_start_dt=assmt_dt;
      ep_stay_start_rsn=assmt_rsn;
    end;
  end;
  ep_stay_end_dt=assmt_dt ;
  ep_stay_end_rsn=assmt_rsn;
  format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
         ep_stay_end_dt mmddyy10.   ep_stay_end_rsn $2. ;
run;

In your last observation in the desired output, you have ROC as missing, but I think your logic requires it to be 4/2/2017.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

This can also be done without embedding the SET statement in a DO loop:

 

data episode (drop=assmt_:);
  set test;
  by ep_id;
  if first.ep_id then ep_stay_id=.;

  format ep_stay_start_dt mmddyy10. ep_stay_start_rsn $2.
         ep_stay_end_dt mmddyy10.   ep_stay_end_rsn $2. ;
  retain ep_stay_start: ;

  if first.ep_id=1 or lag(assmt_rsn) in ('06','07','08','09') then do;
    ep_stay_id+1;
    ep_stay_start_dt=assmt_dt;
    ep_stay_start_rsn=assmt_rsn;
    if assmt_rsn in ('06','07','08','09') then call missing(of ep_stay_start:);
  end;

  if (assmt_rsn in ('06','07','08','09') or last.ep_id);

  ep_stay_end_dt=assmt_dt ;
  ep_stay_end_rsn=assmt_rsn;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
iroy
Calcite | Level 5

My apologies if my question was unclear. 

Thank you, both your code suggestions worked very well. And you are correct, I made a mistake. In my desired output the ROC on the last observation should be 4/2/2017.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 475 views
  • 0 likes
  • 3 in conversation