I have the following data
id start end
1 1/1/2005 2/2/2005
1 2/2/2005 3/3/2005
1 4/4/2005 5/5/2005
2 1/2/2005 2/2/2005
2 2/2/2006 3/3/2006
2 3/3/2006 4/4/2006
I want to define continuous enrollment allowing a gap of 31 days.
output
id start end
1 1/1/2005 5/5/2005
2 1/2/2005 2/2/2005
2 2/2/2006 4/4/2006
This should do it:
data have; informat start end mmddyy10.; format start end date9.; input id start end; cards; 1 1/1/2005 2/2/2005 1 2/2/2005 3/3/2005 1 4/4/2005 5/5/2005 2 1/2/2005 2/2/2005 2 2/2/2006 3/3/2006 2 3/3/2006 4/4/2006 ;run; data want; set have; by id; if first.id then start0=start; /* start a new period */ else if start-lag_end>31 then do; output; /* output previous period */ start0=start; /* start a new period */ end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;
Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.
It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1).
I think you need 2 data step2:
data have;
input id start :mmddyy10. end :mmddyy10.;
format start end date9.;
put (_all_) (=);
datalines;
1 1/1/2005 2/2/2005
1 2/2/2005 3/3/2005
1 4/4/2005 5/5/2005
2 1/2/2005 2/2/2005
2 2/2/2006 3/3/2006
2 3/3/2006 4/4/2006
run;
data need (drop=_:);
set have nobs=nrecs end=eod1;
by id;
if _n_<nrecs then set have (firstobs=2 keep=start rename=(start=_next_start)) ;
else _next_start=constant('big');
retain _overall_start _n _max_n ;
if first.id then do;
_overall_start=start;
_n=0;
end;
if last.id or _next_start-end>32 then do;
start=_overall_start;
_n+1;
output;
_overall_start=_next_start;
_max_n=max(_max_n,_n);
end;
if eod1 then call symput('n',cats(_max_n));
run;
data want (drop=_:);
do _i=1 by 1 until (last.id);
set need;
by id;
array sd {&n} start1-start&n;
array ed {&n} end1-end&n;
format start: end: date9. ;
sd{_i}=start;
ed{_i}=end;
end;
drop start end;
run;
In the first program there is a second SET statement with firstobs=2 option, so that there is always a single record lookahead to provide the value for next_start. At the end of processing _MAX_N is assign to macrovar _N, which gets used in the DATA WANT step. The second program has SET inside a "do ... unitl (last.id)" loop so that each sequential record can have its values ssigned to the corresponding element of the SD and ED arrays.
This should do it:
data have; informat start end mmddyy10.; format start end date9.; input id start end; cards; 1 1/1/2005 2/2/2005 1 2/2/2005 3/3/2005 1 4/4/2005 5/5/2005 2 1/2/2005 2/2/2005 2 2/2/2006 3/3/2006 2 3/3/2006 4/4/2006 ;run; data want; set have; by id; if first.id then start0=start; /* start a new period */ else if start-lag_end>31 then do; output; /* output previous period */ start0=start; /* start a new period */ end;
lag_end=end; if last.id then output; /* output last period */ retain start0 lag_end; drop start end; rename start0=start lag_end=end; format start0 lag_end date9.; run;
Basically, Start0 is used to store the beginning of the output interval, lag_end is used to store the previous value of End. These two values become the new Start and End.
It seems that your stipulated output is not correct, though, as there are 32 days from March 3rd to April 4th (so there are two ranges for ID=1).
Thank you both for the codes, the second one is much easier to follow.
Assuming there are not missing value in START or END variables.
data have;
informat start end mmddyy10.;
format start end date9.;
input id start end;
cards;
1 1/1/2005 2/2/2005
1 2/2/2005 3/3/2005
1 4/4/2005 5/5/2005
2 1/2/2005 2/2/2005
2 2/2/2006 3/3/2006
2 3/3/2006 4/4/2006
;
run;
data temp;
set have;
by id;
if first.id or start-lag(end)>32 then group+1;
run;
data want;
set temp(rename=(start=_start));
by group;
retain start;
if first.group then start=_start;
if last.group then output;
format start date9.;
drop _start;
run;
Thank you @Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.