I have data with many records per ID:
data have;
input ID day;
DATALINES;
1 2019/01/15
1 2019/01/20
1 2019/04/13
1 2019/04/15
1 2019/04/20
2 2019/06/20
2 2019/08/20
2 2019/08/21
2 2019/09/11
2 2019/12/12
;
I need to create episodes that start when an observation (per ID) occurs more than 25 days after the start of each episode. I am able to output the start date, but it's of the entire ID group. How do I restart the "start" date when a new episode begins? Desired output below:
ID | Date | Episode | Start_flag | time-Start |
1 | 2019/01/15 | 1 | 1 | . |
1 | 2019/01/20 | 1 | 0 | 5 |
1 | 2019/04/13 | 2 | 1 | . |
1 | 2019/04/15 | 2 | 0 | 2 |
1 | 2019/04/20 | 2 | 0 | 7 |
2 | 2019/06/20 | 1 | 1 | . |
2 | 2019/08/20 | 2 | 1 | . |
2 | 2019/08/21 | 2 | 0 | 1 |
2 | 2019/09/11 | 2 | 0 | 22 |
2 | 2019/12/12 | 3 | 1 | . |
data have;
informat id 3. day yymmdd10.;
format day date9. ;
infile datalines dlm='09'x;
input ID day;
DATALINES;
1 2019/01/15
1 2019/01/20
1 2019/04/13
1 2019/04/15
1 2019/04/20
2 2019/06/20
2 2019/08/20
2 2019/08/21
2 2019/09/11
2 2019/12/12
;
run;
data want;
format day start_dt date9. ;
set have;
by ID;
retain start_dt episode start_flag time_start;
if first.id then do;
start_dt = day;
episode=1;
start_flag=1;
time_start = .;
output;
end; else
if day - start_dt < 25 then do;
start_flag = 0;
time_start = day - start_dt;
output;
end; else do;
start_dt = day;
episode +1;
start_flag = 1;
time_start = .;
output;
end;
run;
First thing with dealing with dates is to make sure that you have date values so that you can manipulate them.
Consider:
data have; input ID day: yymmdd10.; format day yymmdd10.; DATALINES; 1 2019/01/15 1 2019/01/20 1 2019/04/13 1 2019/04/15 1 2019/04/20 2 2019/06/20 2 2019/08/20 2 2019/08/21 2 2019/09/11 2 2019/12/12 ; data want; set have; by id day; retain episode epstart; if first.id then do episode=1; start_flag=1; epstart=day; end; else do; if day-epstart>25 then do; episode+1; start_flag=1; epstart=day; end; else do; start_flag=0; time_start=day-epstart; end; end; drop epstart; run;
Assumes the data is sorted by ID and day.
The Retain allows keeping the value of a variable across iterations of the data step. The Episode is a counter updated as needed, Epstart holds the start date of the current episode.
The FIRST. ID sets starting values when the first value of an ID is encountered. The Else has the instructions for logic when the current record is not the first for a given ID. The Day-epstart gives the number of days between the current value of Day and the current episode date value. If the value is > 25 then increments the episode, sets the start flag and resets the first day of the new episode.
The DoW-loop helps make the logic fairly concise:
data have ;
input id date :yymmdd10. ;
format date yymmdd10. ;
cards ;
1 2019/01/15
1 2019/01/20
1 2019/04/13
1 2019/04/15
1 2019/04/20
2 2019/06/20
2 2019/08/20
2 2019/08/21
2 2019/09/11
2 2019/12/12
;
run ;
data want ;
do until (last.id) ;
set have ;
by id ;
if not first.id and date <= first.date + 25 then do ;
start_flag = 0 ;
time_start = date - first.date ;
end ;
else do ;
start_flag = 1 ;
time_start = . ;
first.date = date ;
end ;
episode = sum (episode, start_flag) ;
output ;
end ;
run ;
Kind regards
Paul D.
data have;
input ID day: yymmdd10.;
format day yymmdd10.;
DATALINES;
1 2019/01/15
1 2019/01/20
1 2019/04/13
1 2019/04/15
1 2019/04/20
2 2019/06/20
2 2019/08/20
2 2019/08/21
2 2019/09/11
2 2019/12/12
;
data temp;
set have;
by id;
if first.id then episode=0;
if first.id or dif(day)>25 then episode+1;
run;
data want;
set temp;
by id episode;
retain first;
if first.episode then first=day;
start_flag=first.episode;
time_start=day-first;
drop first;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.