BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ModeratelyWise
Obsidian | Level 7

Hi all,

So I have a new problem to solve with the data I've been working on.  I am looking at clients that can have multiple episodes and each episode can have multiple events.  I want to make it so that the dates on concurrent events have no gaps and do not overlap. 

 

Here is what  a simplified version of the data looks like:

Client_IDEpisode_IDEvent_IDBegin_DateEnd_DateEvent_Plcmnt
11120-May-1923-May-19GC   
11222-May-1923-May-19OTHER
11323-May-1923-Oct-19GC   
11429-Jun-1930-Jun-19OTHER
11614-Jul-1915-Jul-19OTHER
11719-Jul-1923-Jul-19OTHER
11525-Jul-198-Aug-19OTHER
1189-Aug-1910-Aug-19OTHER
11911-Aug-1912-Aug-19OTHER
12101-Nov-1916-Apr-20URC  
231119-Feb-161-Mar-16LRC
23125-Mar-1614-Jun-16URC  

 

Here is what I want it to look like after:

Client_IDEpisode_IDEvent_IDBegin_DateEnd_DateEvent_Plcmnt
11120-May-1922-May-19GC   
11222-May-1923-May-19OTHER
11323-May-1929-Jun-19GC   
11429-Jun-1930-Jun-19OTHER
11330-Jun-1914-Jul-19GC   
11614-Jul-1915-Jul-19OTHER
11315-Jul-1919-Jul-19GC
11719-Jul-1923-Jul-19OTHER
11323-Jul-1925-Jul-19GC
11525-Jul-198-Aug-19OTHER
1138-Aug-199-Aug-19GC
1189-Aug-1910-Aug-19OTHER
11310-Aug-1911-Aug-19GC
11911-Aug-1912-Aug-19OTHER
11312-Aug-1923-Oct-19GC
12101-Nov-1916-Apr-20URC  
231119-Feb-165-Mar-16LRC
23125-Mar-1614-Jun-16URC  

 

I've highlighted the new rows and changes to existing rows green.

The rules I am using:

  1) If two events that are in the same episode that overlap the end_date of the older episode becomes the begin_date of the newer episode (see episode 3).

 2) If there is a gap days between events in the same episode the end_date of the older episode becomes the begin_date of the newer episode (see episode 11).

3) If the dates of an event are encompassed by another event in the same episode then the end_date of the encompassing episode becomes the begin_date of the encompassed episode and a new row is created with the Event_ID and  the Event_Plcmnt of the compassing episode that has the begin_date of the encompassed episode and either the end_date of the original encompassing event or, if there is more than one encompassed event, the begin_date of the next encompassed event.

 

Some notes about the data:

  • There are a lot more fields than what I have here...I wanted to keep it as simple as possible.
  • I found that the Event_IDs are not in chronological order.  Sometimes a newer ID will have a lower number.

Here is the data in a usable form:

 

data have;
infile datalines truncover;
input Client_ID $ Episode_ID $ Event_ID $ Begin_Date End_Date Event_Plcmnt;
datalines;
001 001 001 20MAY2019 23MAY2019 GC
001 001 002 22MAY2019 23MAY2019 OTHER
001 001 003 23MAY2019 23OCT2019 GC
001 001 004 29JUN2019 30JUN2019 OTHER
001 001 006 14JUL2019 15JUL2019 OTHER
001 001 007 19JUL2019 23JUL2019 OTHER
001 001 005 25JUL2019 08AUG2019 OTHER
001 001 008 09AUG2019 10AUG2019 OTHER
001 001 009 11AUG2019 12AUG2019 OTHER
001 002 010 01NOV2019 16APR2020 URC
002 003 011 19FEB2016 01MAR2016 LRC
002 003 012 05MAR2016 14JUN2016 URC
;

 

Hopefully this is clear enough.  I tried using the lag function but there can sometimes be 10 or more event nested inside another it got too complex to keep track of.

 

Best,


Marc

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Please alter to answer your needs:

data BOUNDARIES(index=(A=(CLIENT_ID EPISODE_ID INTERVAL_START)));
  merge HAVE 
        HAVE(firstobs=2 keep=CLIENT_ID EPISODE_ID BEGIN_DATE rename=(CLIENT_ID=C EPISODE_ID=E BEGIN_DATE=B));
  retain INTERVAL_START INTERVAL_END;            
  format INTERVAL_END INTERVAL_START date9.;
  if INTERVAL_END=. then INTERVAL_START=BEGIN_DATE;
  if INTERVAL_END=END_DATE then END_FLAG+1; 
  INTERVAL_END=max(INTERVAL_END,END_DATE);
  if END_FLAG                              %* max end date matched       ;
   | B > INTERVAL_END                      %* next start date > end date ;
   | (E ne EPISODE_ID and C ne CLIENT_ID)  %* new client or new episode  ;
     then do;
     output; 
     call missing(INTERVAL_END, END_FLAG); 
     INTERVAL_NB+1; 
  end; 
  keep CLIENT_ID EPISODE_ID INTERVAL_START INTERVAL_END INTERVAL_NB;
run;

proc sort data=HAVE(keep=CLIENT_ID EPISODE_ID END_DATE) out=ENDS; 
  by CLIENT_ID EPISODE_ID END_DATE;
run;
data ALL_INTERVALS;
  set HAVE(keep=CLIENT_ID EPISODE_ID BEGIN_DATE rename=(BEGIN_DATE=END_DATE))
      ENDS ;
  by CLIENT_ID EPISODE_ID END_DATE;
  BEGIN_DATE=lag(END_DATE);   
  if ^first.EPISODE_ID and BEGIN_DATE ne END_DATE then output;
  format BEGIN_DATE END_DATE  date9.;
run;

data WANT;
  merge HAVE (drop=END_DATE)
        ALL_INTERVALS ; 
  by CLIENT_ID EPISODE_ID BEGIN_DATE;
  set BOUNDARIES(rename=(INTERVAL_START=BEGIN_DATE)) key=A;
  if _IORC_ then _ERROR_=0;
  INTERVAL_END=coalesce(INTERVAL_END,lag(INTERVAL_END));
  output;
  if END_DATE = INTERVAL_END then INTERVAL_END=.;
  retain INTERVAL:;
run;
Period MAY2020 - APR2021

Obs CLIENT_ID EPISODE_ID EVENT_ID BEGIN_DATE EVENT_PLCMNT END_DATE INTERVAL_END INTERVAL_NB
1 001 001 001 20MAY2019 GC 22MAY2019 23MAY2019 0
2 001 001 002 22MAY2019 OTHER 23MAY2019 23MAY2019 0
3 001 001 003 23MAY2019 GC 29JUN2019 23OCT2019 1
4 001 001 004 29JUN2019 OTHER 30JUN2019 23OCT2019 1
5 001 001   30JUN2019   14JUL2019 23OCT2019 1
6 001 001 006 14JUL2019 OTHER 15JUL2019 23OCT2019 1
7 001 001   15JUL2019   19JUL2019 23OCT2019 1
8 001 001 007 19JUL2019 OTHER 23JUL2019 23OCT2019 1
9 001 001   23JUL2019   25JUL2019 23OCT2019 1
10 001 001 005 25JUL2019 OTHER 08AUG2019 23OCT2019 1
11 001 001   08AUG2019   09AUG2019 23OCT2019 1
12 001 001 008 09AUG2019 OTHER 10AUG2019 23OCT2019 1
13 001 001   10AUG2019   11AUG2019 23OCT2019 1
14 001 001 009 11AUG2019 OTHER 12AUG2019 23OCT2019 1
15 001 001   12AUG2019   23OCT2019 23OCT2019 1
16 001 002 010 01NOV2019 URC 16APR2020 16APR2020 2
17 002 003 011 19FEB2016 LRC 01MAR2016 01MAR2016 3
18 002 003   01MAR2016   05MAR2016 01MAR2016 3
19 002 003 012 05MAR2016 URC 14JUN2016 14JUN2016 4

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

1. Your code does not run. Always paste it back to SAS and vet it please.

2. Please use the appropriate icon to paste code.

3. I would extract the dates, and then merge back by start date to the original data:

proc sort data=HAVE(keep=CLIENT_ID END_DATE) out=END; 
  by CLIENT_ID END_DATE;
run;

data INTERVALS;
  format BEGIN_DATE END_DATE  date9.;
  set HAVE(keep=CLIENT_ID BEGIN_DATE rename=(BEGIN_DATE=END_DATE))
      END ;
  by CLIENT_ID END_DATE;
  BEGIN_DATE=lag(END_DATE);
  if ^first.CLIENT_ID then output;
run;

 

CLIENT_ID BEGIN_DATE END_DATE
001 20MAY2019 22MAY2019
001 22MAY2019 23MAY2019
001 23MAY2019 23MAY2019
001 23MAY2019 23MAY2019
001 23MAY2019 29JUN2019
001 29JUN2019 30JUN2019
001 30JUN2019 14JUL2019
001 14JUL2019 15JUL2019
001 15JUL2019 19JUL2019
001 19JUL2019 23JUL2019
001 23JUL2019 25JUL2019
001 25JUL2019 08AUG2019
001 08AUG2019 09AUG2019
001 09AUG2019 10AUG2019
001 10AUG2019 11AUG2019
001 11AUG2019 12AUG2019
001 12AUG2019 23OCT2019
001 23OCT2019 01NOV2019
001 01NOV2019 16APR2020
002 19FEB2016 01MAR2016
002 01MAR2016 05MAR2016
002 05MAR2016 14JUN2016

 

ModeratelyWise
Obsidian | Level 7

Sorry @ChrisNZ  . I was rushing and I apologize.  I will try not to let happen again. I'll try to run your code in the morning.

data have;
infile datalines truncover;
input Client_ID $ Episode_ID $ Event_ID $ Begin_Date date9. End_Date date9. Event_Plcmnt $;
datalines;
001	001	001	20MAY2019	23MAY2019	GC
001	001	002	22MAY2019	23MAY2019	OTHER
001	001	003	23MAY2019	23OCT2019	GC
001	001	004	29JUN2019	30JUN2019	OTHER
001	001	006	14JUL2019	15JUL2019	OTHER
001	001	007	19JUL2019	23JUL2019	OTHER
001	001	005	25JUL2019	08AUG2019	OTHER
001	001	008	09AUG2019	10AUG2019	OTHER
001	001	009	11AUG2019	12AUG2019	OTHER
001	002	010	01NOV2019	16APR2020	URC
002	003	011	19FEB2016	01MAR2016	LRC
002	003	012	05MAR2016	14JUN2016	URC
;

@

ModeratelyWise
Obsidian | Level 7

It seems to have generated some extra lines.(in red)..is there any way to fix it?

 

CLIENT_ID BEGIN_DATE END_DATE

00120MAY201922MAY2019
00122MAY201923MAY2019
00123MAY201923MAY2019
00123MAY201923MAY2019
00123MAY201929JUN2019
00129JUN201930JUN2019
00130JUN201914JUL2019
00114JUL201915JUL2019
00115JUL201919JUL2019
00119JUL201923JUL2019
00123JUL201925JUL2019
00125JUL201908AUG2019
00108AUG201909AUG2019
00109AUG201910AUG2019
00110AUG201911AUG2019
00111AUG201912AUG2019
00112AUG201923OCT2019
00123OCT201901NOV2019
00101NOV201916APR2020
00219FEB201601MAR2016
00201MAR201605MAR2016
00205MAR201614JUN2016
ChrisNZ
Tourmaline | Level 20

The first 2 are easy to remove.

 if ^first.CLIENT_ID and START_DATE ne END_DATE then output;

I don't know why the last 2 are unwanted.

ModeratelyWise
Obsidian | Level 7

Sorry I highlighted the wrong one.  This is what I meant to do:

 

 

CLIENT_IDBEGIN_DATEEND_DATE
120-May-1922-May-19
122-May-1923-May-19
123-May-1923-May-19
123-May-1923-May-19
123-May-1929-Jun-19
129-Jun-1930-Jun-19
130-Jun-1914-Jul-19
114-Jul-1915-Jul-19
115-Jul-1919-Jul-19
119-Jul-1923-Jul-19
123-Jul-1925-Jul-19
125-Jul-198-Aug-19
18-Aug-199-Aug-19
19-Aug-1910-Aug-19
110-Aug-1911-Aug-19
111-Aug-1912-Aug-19
112-Aug-1923-Oct-19
123-Oct-191-Nov-19
11-Nov-1916-Apr-20
219-Feb-161-Mar-16
21-Mar-165-Mar-16
25-Mar-1614-Jun-16

 

The row with the begin_date 23OCT19 shouldn't be there because the event with the 1NOV19 begin_date has a different ID_Episode than the event with the begin_date of 12AUG19.  

 

The row with the begin_date 01MAR16 shouldn't be there because the event with the begin_date 19FEB16 doesn't overlap the event with the begin date 05MAR16. There is a gap between in the dates between the events.  When this occurs I need the end_date of the row beginning 19FEB16 to change to the begin_date of the following event (so 05MAR16).  I hope I am being clear...it is hard to explain these thing properly via this forum for me.

ChrisNZ
Tourmaline | Level 20

Please alter to answer your needs:

data BOUNDARIES(index=(A=(CLIENT_ID EPISODE_ID INTERVAL_START)));
  merge HAVE 
        HAVE(firstobs=2 keep=CLIENT_ID EPISODE_ID BEGIN_DATE rename=(CLIENT_ID=C EPISODE_ID=E BEGIN_DATE=B));
  retain INTERVAL_START INTERVAL_END;            
  format INTERVAL_END INTERVAL_START date9.;
  if INTERVAL_END=. then INTERVAL_START=BEGIN_DATE;
  if INTERVAL_END=END_DATE then END_FLAG+1; 
  INTERVAL_END=max(INTERVAL_END,END_DATE);
  if END_FLAG                              %* max end date matched       ;
   | B > INTERVAL_END                      %* next start date > end date ;
   | (E ne EPISODE_ID and C ne CLIENT_ID)  %* new client or new episode  ;
     then do;
     output; 
     call missing(INTERVAL_END, END_FLAG); 
     INTERVAL_NB+1; 
  end; 
  keep CLIENT_ID EPISODE_ID INTERVAL_START INTERVAL_END INTERVAL_NB;
run;

proc sort data=HAVE(keep=CLIENT_ID EPISODE_ID END_DATE) out=ENDS; 
  by CLIENT_ID EPISODE_ID END_DATE;
run;
data ALL_INTERVALS;
  set HAVE(keep=CLIENT_ID EPISODE_ID BEGIN_DATE rename=(BEGIN_DATE=END_DATE))
      ENDS ;
  by CLIENT_ID EPISODE_ID END_DATE;
  BEGIN_DATE=lag(END_DATE);   
  if ^first.EPISODE_ID and BEGIN_DATE ne END_DATE then output;
  format BEGIN_DATE END_DATE  date9.;
run;

data WANT;
  merge HAVE (drop=END_DATE)
        ALL_INTERVALS ; 
  by CLIENT_ID EPISODE_ID BEGIN_DATE;
  set BOUNDARIES(rename=(INTERVAL_START=BEGIN_DATE)) key=A;
  if _IORC_ then _ERROR_=0;
  INTERVAL_END=coalesce(INTERVAL_END,lag(INTERVAL_END));
  output;
  if END_DATE = INTERVAL_END then INTERVAL_END=.;
  retain INTERVAL:;
run;
Period MAY2020 - APR2021

Obs CLIENT_ID EPISODE_ID EVENT_ID BEGIN_DATE EVENT_PLCMNT END_DATE INTERVAL_END INTERVAL_NB
1 001 001 001 20MAY2019 GC 22MAY2019 23MAY2019 0
2 001 001 002 22MAY2019 OTHER 23MAY2019 23MAY2019 0
3 001 001 003 23MAY2019 GC 29JUN2019 23OCT2019 1
4 001 001 004 29JUN2019 OTHER 30JUN2019 23OCT2019 1
5 001 001   30JUN2019   14JUL2019 23OCT2019 1
6 001 001 006 14JUL2019 OTHER 15JUL2019 23OCT2019 1
7 001 001   15JUL2019   19JUL2019 23OCT2019 1
8 001 001 007 19JUL2019 OTHER 23JUL2019 23OCT2019 1
9 001 001   23JUL2019   25JUL2019 23OCT2019 1
10 001 001 005 25JUL2019 OTHER 08AUG2019 23OCT2019 1
11 001 001   08AUG2019   09AUG2019 23OCT2019 1
12 001 001 008 09AUG2019 OTHER 10AUG2019 23OCT2019 1
13 001 001   10AUG2019   11AUG2019 23OCT2019 1
14 001 001 009 11AUG2019 OTHER 12AUG2019 23OCT2019 1
15 001 001   12AUG2019   23OCT2019 23OCT2019 1
16 001 002 010 01NOV2019 URC 16APR2020 16APR2020 2
17 002 003 011 19FEB2016 LRC 01MAR2016 01MAR2016 3
18 002 003   01MAR2016   05MAR2016 01MAR2016 3
19 002 003 012 05MAR2016 URC 14JUN2016 14JUN2016 4

 

ModeratelyWise
Obsidian | Level 7
Would you be willing to explain what some of the steps are doing?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 940 views
  • 3 likes
  • 2 in conversation