BookmarkSubscribeRSS Feed
lijunchen
Calcite | Level 5

Hello Community Members, 

 

I have an event episode data file for children involved in child welfare. The variables include CHILDID, event TYPE (3 categories: investigation INV, service, SRV, out of home placement PLC), event START date, event STOP date, and the event sequence number (EVENT_NUM). The events are sorted based on start and stop dates for each child. However, there are considerable overlapping of the events event for the same event type. I would like to collapse the overlapping or connecting events of the same type into one event with the earliest start and latest stop dates of all related events. I have figured out how to do it, but the SAS codes seem too complex. I have included the SAS codes together with some sample events (for two children). Please let me know if you find any problem or you think there are more efficient ways to do it. Thanks,

** Identify overlapping events and collapse overlapping and/or connected events of same type;
 
data one;
  input childid  type $  (start  stop) (: mmddyy12.) event_num;
  format start stop  mmddyy10.;
cards;
1 inv  03/01/2015 03/10/2015 1
1 srv  03/10/2015 05/01/2015 2
1 inv  06/01/2015 06/30/2015 3
1 srv  06/20/2015 06/30/2015 4
1 srv  01/01/2016 03/01/2016 5
1 srv  02/01/2016 05/01/2016 6
1 plc  06/01/2016 10/04/2016 7
1 inv  10/01/2016 10/10/2016 8
1 inv  10/11/2016 10/20/2016 9
1 plc  10/21/2016 05/01/2017 10
2 inv  12/01/2015 12/20/2015 1
2 inv  12/21/2015 12/30/2015 2
2 srv  12/25/2015 06/01/2016 3
2 srv  03/01/2016 05/01/2016 4
2 srv  06/02/2016 10/05/2016 5
2 srv  12/01/2016 12/30/2016 6
2 plc  12/30/2016 04/04/2017 7
;
run;

** Sort by childid, start and stop dates;
proc sort data=one;
   by childid start stop type;
run;

** Purpose: collapse same type events that overlap (id1, num 5,6), connect (id 1 num 8,9 / id2 , num 1,2), or;
** envelop & connect each other (id 2,, num 3,4,5) as One single event, using earliest start date and latest end date;
** Like the following, with events of overlapping dates collpased;
/*  
1 inv  03/01/2015 03/10/2015 1
1 srv  03/10/2015 05/01/2015 2
1 inv  06/01/2015 06/30/2015 3
1 srv  06/20/2015 06/30/2015 4
1 srv  01/01/2016 05/01/2016 5+6
1 plc  06/01/2016 10/04/2016 7
1 inv  10/01/2016 10/20/2016 8+9
1 plc  10/21/2016 05/01/2017 10
2 inv  12/01/2015 12/30/2015 1+2
2 srv  12/25/2015 10/05/2016 3+4+5
2 srv  12/01/2016 12/30/2016 6
2 plc  12/30/2016 04/04/2017 7
*/

data two;
  set one;
  by childid start stop type;
  retain istart istop itype counter;

  newstart=start;
  newstop=stop;

  if first.childid then do;
    istart=start;
    istop=stop;
    itype=type;
    counter=0; 
  end;

  counter=counter+1;

  if not first.childid then do;
    if type ne itype then do;
       itype=type;
       istart=start;
       istop=stop;
    end;
 
else if type eq itype then do; if start gt (istop+1) then do; * check=0; istart=start; istop=stop; end; else if start le (istop+1) then do; check=1; newstart=istart; if stop le istop then newstop=istop; /*for nested event*/ if stop gt istop then istop=stop; end; end; end; format istart istop newstart newstop mmddyy10.; run; proc sort data=two; by childid itype newstart descending newstop; run; proc sort data=two out=three nodupkey; by childid itype newstart; * descending newstop; run; proc sort data=three; by childid counter; run;

       

 

2 REPLIES 2
s_lassen
Meteorite | Level 14

Here is a suggestion:

proc sort data=one;
  by childid type start;
run;

data two;
  set have;
  by childid type;
  if not last.type then do;
    _N_+1;
    set have(keep=start rename=(start=nxt_start)) point=_N_;
    if nxt_start>stop+1 then
      _seq_no+1;
    end;
drop nxt_start; run; data want; do until(last._seq_no); set have; by childid type _seq_no; if first._seq_no then do; _start=start; _stop=stop; end; else _stop=max(stop,_stop); end; stop=_stop; start=_start; drop _:; run;
lijunchen
Calcite | Level 5

Thanks for your suggestion, s_lassen,

Your codes look more elegant. I changed the file name and point variable to make it run. However, the results are not as I expected. I tried to tweak it but cannot get the right results. Would be glad if you can check further?  

 

proc sort data=one;
  by childid type start;
run;

data two;
  set one;
  by childid type;
  if not last.type then do;
    pt=_N_+1;
    set one (keep=start rename=(start=nxt_start)) point=pt;
      if nxt_start>stop+1 then _seq_no+1;
  end;
run;
**The last type record has nxt_start value, should we put it to Missing?;   

data want;
    do until(last._seq_no);
    set two ;
    by childid type _seq_no;
    if first._seq_no then do;
      _start=start;
      _stop=stop;
      end;
    else _stop=max(stop,_stop);
    end;
  stop=_stop;
  start=_start;
  drop _:;
run;
* The resulting data is not correct; 

proc sort data= want;
  by childid start stop;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 440 views
  • 0 likes
  • 2 in conversation