BookmarkSubscribeRSS Feed
MB_Analyst
Obsidian | Level 7

I have data with many records per ID:

 

data have;
	input ID day;
	DATALINES;
1	2019/01/15
1	2019/01/20
1	2019/04/13
1	2019/04/15
1	2019/04/20
2	2019/06/20
2	2019/08/20
2	2019/08/21
2	2019/09/11
2	2019/12/12
;

I need to create episodes that start when an observation (per ID) occurs more than 25 days after the start of each episode. I am able to output the start date, but it's of the entire ID group. How do I restart the "start" date when a new episode begins? Desired output below:

 

 

ID Date Episode Start_flag time-Start
1 2019/01/15 1 1 .
1 2019/01/20 1 0 5
1 2019/04/13 2 1 .
1 2019/04/15 2 0 2
1 2019/04/20 2 0 7
2 2019/06/20 1 1 .
2 2019/08/20 2 1 .
2 2019/08/21 2 0 1
2 2019/09/11 2 0 22
2 2019/12/12 3 1 .
4 REPLIES 4
Shmuel
Garnet | Level 18
data have;
    informat id 3. day yymmdd10.;
    format day date9. ;    
    infile datalines dlm='09'x;
	input ID day; 
DATALINES;
1	2019/01/15
1	2019/01/20
1	2019/04/13
1	2019/04/15
1	2019/04/20
2	2019/06/20
2	2019/08/20
2	2019/08/21
2	2019/09/11
2	2019/12/12
;
run;
data want;
    format day start_dt date9. ;
 set have;
  by ID;
     retain start_dt episode start_flag time_start;
     if first.id then do; 
        start_dt = day; 
        episode=1; 
        start_flag=1; 
        time_start = .;
        output;
     end; else
     if day - start_dt < 25 then do;
        start_flag = 0;
        time_start = day - start_dt;
        output;
     end; else do;
        start_dt = day; 
        episode +1;
        start_flag = 1;
        time_start = .;
        output;
     end;
run;
ballardw
Super User

First thing with dealing with dates is to make sure that you have date values so that you can manipulate them.

Consider:

data have;
	input ID day: yymmdd10.;
   format day yymmdd10.;
	DATALINES;
1	2019/01/15
1	2019/01/20
1	2019/04/13
1	2019/04/15
1	2019/04/20
2	2019/06/20
2	2019/08/20
2	2019/08/21
2	2019/09/11
2	2019/12/12
;

data want;
   set have;
   by id day;
   retain episode epstart;
   if first.id then do 
      episode=1;
      start_flag=1;
      epstart=day;
   end;
   else do;
      if day-epstart>25 then do;
         episode+1;
         start_flag=1;
         epstart=day;
      end;
      else do;
         start_flag=0;
         time_start=day-epstart;
      end;
   end;
   drop epstart;
run;

Assumes the data is sorted by ID and day.

The Retain allows keeping the value of a variable across iterations of the data step. The Episode is a counter updated as needed, Epstart holds the start date of the current episode.

The FIRST. ID sets starting values when the first value of an ID is encountered. The Else has the instructions for logic when the current record is not the first for a given ID. The Day-epstart gives the number of days between the current value of Day and the current episode date value. If the value is > 25 then increments the episode, sets the start flag and resets the first day of the new episode.

hashman
Ammonite | Level 13

@MB_Analyst:

The DoW-loop helps make the logic fairly concise:

data have ;                                                                                                                             
  input id date :yymmdd10. ;                                                                                                            
  format date yymmdd10. ;                                                                                                               
  cards ;                                                                                                                               
1  2019/01/15                                                                                                                           
1  2019/01/20                                                                                                                           
1  2019/04/13                                                                                                                           
1  2019/04/15                                                                                                                           
1  2019/04/20                                                                                                                           
2  2019/06/20                                                                                                                           
2  2019/08/20                                                                                                                           
2  2019/08/21                                                                                                                           
2  2019/09/11                                                                                                                           
2  2019/12/12                                                                                                                           
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  do until (last.id) ;                                                                                                                  
    set have ;                                                                                                                          
    by id ;                                                                                                                             
    if not first.id and date <= first.date + 25 then do ;                                                                               
      start_flag = 0 ;                                                                                                                  
      time_start = date - first.date ;                                                                                                  
    end ;                                                                                                                               
    else do ;                                                                                                                           
      start_flag = 1 ;                                                                                                                  
      time_start = . ;                                                                                                                  
      first.date = date ;                                                                                                               
    end ;                                                                                                                               
    episode = sum (episode, start_flag) ;                                                                                               
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                      

Kind regards

Paul D.

 

Ksharp
Super User
data have;
	input ID day: yymmdd10.;
   format day yymmdd10.;
	DATALINES;
1	2019/01/15
1	2019/01/20
1	2019/04/13
1	2019/04/15
1	2019/04/20
2	2019/06/20
2	2019/08/20
2	2019/08/21
2	2019/09/11
2	2019/12/12
;
data temp;
 set have;
 by id;
 if first.id then episode=0;
 if first.id or dif(day)>25 then episode+1;
run;
data want;
 set temp;
 by id episode;
 retain first;
 if first.episode then first=day;
 start_flag=first.episode;
 time_start=day-first;
 drop first;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 985 views
  • 0 likes
  • 5 in conversation