DATA Step, Macro, Functions and more

continuous period with a gap

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 136
Accepted Solution

continuous period with a gap

I have the following data 

id   start end
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006

I want to define continuous enrollment allowing a gap of 31 days.

output 

id start end
1 1/1/2005  5/5/2005
2 1/2/2005  2/2/2005
2 2/2/2006 4/4/2006

Accepted Solutions
Solution
‎10-03-2017 08:27 AM
PROC Star
Posts: 249

Re: continuous period with a gap

Posted in reply to lillymaginta

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,309

Re: continuous period with a gap

Posted in reply to lillymaginta

I think you need 2 data step2:

 

  1. From HAVE, form records, where each record has a single continuous date range, tolerating up to 31 day internal gaps, making data set NEED.  Simultaneously track the making number of ranges that any single ID requires, call it _MAX_N

    data have;
      input id start :mmddyy10. end :mmddyy10.;
      format start end date9.;
      put (_all_) (=);
    datalines;
    1  1/1/2005  2/2/2005
    1   2/2/2005  3/3/2005
    1   4/4/2005  5/5/2005
    2   1/2/2005  2/2/2005
    2   2/2/2006  3/3/2006
    2   3/3/2006   4/4/2006
    run;
    
    data need (drop=_:);
      set have nobs=nrecs end=eod1;
      by id;
    
      if _n_<nrecs then set have (firstobs=2 keep=start rename=(start=_next_start)) ;
      else _next_start=constant('big');
    
      retain _overall_start _n _max_n ;
      if first.id then do;
        _overall_start=start;
        _n=0;
      end;
    
      if last.id or _next_start-end>32 then do;
        start=_overall_start;
        _n+1;
        output;
        _overall_start=_next_start;
        _max_n=max(_max_n,_n);
      end;
      if eod1 then call symput('n',cats(_max_n));
    run;
    
  2. From NEED, make WANT, with one record per ID, but with _MAX_N pairs of start/end variables. 

    data want (drop=_:);
      do _i=1 by 1 until (last.id);
        set need;
        by id;
        array sd {&n} start1-start&n;
        array ed {&n} end1-end&n;
        format start: end: date9. ;
        sd{_i}=start;
        ed{_i}=end;
      end;
      drop start end;
    run;
    

In the first program there is a second SET statement with firstobs=2 option, so that there is always a single record lookahead to provide the value for next_start.  At the end of processing _MAX_N is assign to macrovar _N, which gets used in the DATA WANT step.  The second program has SET inside a "do ... unitl (last.id)" loop so that each sequential record can have its values ssigned to the corresponding element of the SD and ED arrays.

Solution
‎10-03-2017 08:27 AM
PROC Star
Posts: 249

Re: continuous period with a gap

Posted in reply to lillymaginta

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

Frequent Contributor
Posts: 136

Re: continuous period with a gap

Thank you both for the codes, the second one is much easier to follow. 

Super User
Posts: 10,679

Re: continuous period with a gap

Posted in reply to lillymaginta

Assuming there are not missing value in START or END variables.

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;
run;
data temp;
 set have;
 by id;
 if first.id or start-lag(end)>32 then group+1;
run;
data want;
 set temp(rename=(start=_start));
 by group;
 retain start;
 if first.group then start=_start;
 if last.group then output;
 format start date9.;
 drop _start;
run;
Frequent Contributor
Posts: 136

Re: continuous period with a gap

Thank you @Ksharp

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 229 views
  • 2 likes
  • 4 in conversation