BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10

I have the following data 

id   start end
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006

I want to define continuous enrollment allowing a gap of 31 days.

output 

id start end
1 1/1/2005  5/5/2005
2 1/2/2005  2/2/2005
2 2/2/2006 4/4/2006
1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

I think you need 2 data step2:

 

  1. From HAVE, form records, where each record has a single continuous date range, tolerating up to 31 day internal gaps, making data set NEED.  Simultaneously track the making number of ranges that any single ID requires, call it _MAX_N

    data have;
      input id start :mmddyy10. end :mmddyy10.;
      format start end date9.;
      put (_all_) (=);
    datalines;
    1  1/1/2005  2/2/2005
    1   2/2/2005  3/3/2005
    1   4/4/2005  5/5/2005
    2   1/2/2005  2/2/2005
    2   2/2/2006  3/3/2006
    2   3/3/2006   4/4/2006
    run;
    
    data need (drop=_:);
      set have nobs=nrecs end=eod1;
      by id;
    
      if _n_<nrecs then set have (firstobs=2 keep=start rename=(start=_next_start)) ;
      else _next_start=constant('big');
    
      retain _overall_start _n _max_n ;
      if first.id then do;
        _overall_start=start;
        _n=0;
      end;
    
      if last.id or _next_start-end>32 then do;
        start=_overall_start;
        _n+1;
        output;
        _overall_start=_next_start;
        _max_n=max(_max_n,_n);
      end;
      if eod1 then call symput('n',cats(_max_n));
    run;
    
  2. From NEED, make WANT, with one record per ID, but with _MAX_N pairs of start/end variables. 

    data want (drop=_:);
      do _i=1 by 1 until (last.id);
        set need;
        by id;
        array sd {&n} start1-start&n;
        array ed {&n} end1-end&n;
        format start: end: date9. ;
        sd{_i}=start;
        ed{_i}=end;
      end;
      drop start end;
    run;
    

In the first program there is a second SET statement with firstobs=2 option, so that there is always a single record lookahead to provide the value for next_start.  At the end of processing _MAX_N is assign to macrovar _N, which gets used in the DATA WANT step.  The second program has SET inside a "do ... unitl (last.id)" loop so that each sequential record can have its values ssigned to the corresponding element of the SD and ED arrays.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

@lillymaginta

This should do it:

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;run;

data want;
  set have;
  by id;
  if first.id then
    start0=start; /* start a new period */
  else if start-lag_end>31 then do;
    output; /* output previous period */
    start0=start;  /* start a new period */
    end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;

Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.

It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1). 

lillymaginta
Lapis Lazuli | Level 10

Thank you both for the codes, the second one is much easier to follow. 

Ksharp
Super User

Assuming there are not missing value in START or END variables.

 

data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1  1/1/2005  2/2/2005
1   2/2/2005  3/3/2005
1   4/4/2005  5/5/2005
2   1/2/2005  2/2/2005
2   2/2/2006  3/3/2006
2   3/3/2006   4/4/2006
;
run;
data temp;
 set have;
 by id;
 if first.id or start-lag(end)>32 then group+1;
run;
data want;
 set temp(rename=(start=_start));
 by group;
 retain start;
 if first.group then start=_start;
 if last.group then output;
 format start date9.;
 drop _start;
run;
lillymaginta
Lapis Lazuli | Level 10

Thank you @Ksharp

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1227 views
  • 2 likes
  • 4 in conversation