BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10

I have the following data 

id   start end
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006

I want to define continuous enrollment allowing a gap of 31 days.

output 

id start end
1 1/1/2005  5/5/2005
2 1/2/2005  2/2/2005
2 2/2/2006 4/4/2006
1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

I think you need 2 data step2:

 

  1. From HAVE, form records, where each record has a single continuous date range, tolerating up to 31 day internal gaps, making data set NEED.  Simultaneously track the making number of ranges that any single ID requires, call it _MAX_N

    data have;
      input id start :mmddyy10. end :mmddyy10.;
      format start end date9.;
      put (_all_) (=);
    datalines;
    1  1/1/2005  2/2/2005
    1   2/2/2005  3/3/2005
    1   4/4/2005  5/5/2005
    2   1/2/2005  2/2/2005
    2   2/2/2006  3/3/2006
    2   3/3/2006   4/4/2006
    run;
    
    data need (drop=_:);
      set have nobs=nrecs end=eod1;
      by id;
    
      if _n_<nrecs then set have (firstobs=2 keep=start rename=(start=_next_start)) ;
      else _next_start=constant('big');
    
      retain _overall_start _n _max_n ;
      if first.id then do;
        _overall_start=start;
        _n=0;
      end;
    
      if last.id or _next_start-end>32 then do;
        start=_overall_start;
        _n+1;
        output;
        _overall_start=_next_start;
        _max_n=max(_max_n,_n);
      end;
      if eod1 then call symput('n',cats(_max_n));
    run;
    
  2. From NEED, make WANT, with one record per ID, but with _MAX_N pairs of start/end variables. 

    data want (drop=_:);
      do _i=1 by 1 until (last.id);
        set need;
        by id;
        array sd {&n} start1-start&n;
        array ed {&n} end1-end&n;
        format start: end: date9. ;
        sd{_i}=start;
        ed{_i}=end;
      end;
      drop start end;
    run;
    

In the first program there is a second SET statement with firstobs=2 option, so that there is always a single record lookahead to provide the value for next_start.  At the end of processing _MAX_N is assign to macrovar _N, which gets used in the DATA WANT step.  The second program has SET inside a "do ... unitl (last.id)" loop so that each sequential record can have its values ssigned to the corresponding element of the SD and ED arrays.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

lillymaginta
Lapis Lazuli | Level 10

Thank you both for the codes, the second one is much easier to follow. 

Ksharp
Super User

Assuming there are not missing value in START or END variables.

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;
run;
data temp;
 set have;
 by id;
 if first.id or start-lag(end)>32 then group+1;
run;
data want;
 set temp(rename=(start=_start));
 by group;
 retain start;
 if first.group then start=_start;
 if last.group then output;
 format start date9.;
 drop _start;
run;
lillymaginta
Lapis Lazuli | Level 10

Thank you @Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2098 views
  • 2 likes
  • 4 in conversation