I have the following data:
Data sample;
	input id $ (entry) (:mmddyy10.) (discharge) (:mmddyy10.) event $ 10.;
	format entry discharge date9.;
	If discharge lt "30sep2015"d then los=discharge-entry;
	Else los="30sep2015"d-entry;
	datalines;
1 12/05/2014 12/17/2014 discharge2
1 12/24/2014 01/17/2015 discharge2
1 02/06/2015 03/10/2015 discharge1
2 08/01/2013 08/16/2013 discharge1
2 10/24/2013 11/06/2013 discharge1
3 02/10/2015 05/10/2015 discharge1
3 09/10/2015 10/18/2015 discharge2
4 09/23/2013 10/22/2013 discharge2
4 10/30/2013 07/15/2014 discharge2
4 07/25/2014 08/14/2014 death
5 08/01/2015 08/16/2015 discharge2
5 09/01/2015 11/06/2015 discharge1
6 11/05/2013 09/05/2014 discharge2
6 09/15/2014 07/08/2015 discharge2
6 07/14/2015 07/16/2015 discharge2
;
Run;
Each row represents a period of stay that starts with an entry date and a discharge date. The "los" variable represents the length of stay (i.e. the number of days from entry to discharge). The "event" variable describes the type of discharge (discharge1=return not anticipated, discharge2=return anticipated, death).
I would like to find the cumulative length of stay (los) by id, where counting stops if the event is discharge1 or death, whichever comes first, or the last of a series of discharge2 dates. So the dataset starts to look like this:
And ultimately, this is the dataset I am hoping to end up with:
Data sample1;
   Set sample;
   By id;
   If first.id then rn = 1; 
   Else rn + 1;
Run;
Proc SQL;
	Create table sample2 as
	Select ID, Entry, Discharge, Event, los, rn,
		Case 
			When Event = 'discharge1' then 'Stop'
        	When Event = 'death' then 'Stop'
        	When Discharge >=  "30sep2015"d then 'Stop'
        	Else '.' 
    	End as Include         
		From sample1;
Quit;
Proc SQL;
      Create table sample3 as
      Select ID, rn    
      From sample2
      Where Include = 'Stop';
Quit;
Proc SQL;
     Create table sample4 as
     Select sample2.ID, sum(sample2.los) as cum_los
     From sample2
     Left join sample3 on sample2.ID = sample3.ID
     Where sample2.rn <= sample3.rn or sample3.ID is null
     Group by sample2.ID;
Quit;You may need to modify the final subsetting IF.
data sample1;
set sample;
by id discharge;
if first.id then
cum_los = 0;
cum_los + los;
if event = "discharge1" or event = "death" or discharge = "20sep2015"d;
run;
@Doyinsola What are you doing with ID 2? Is the desired output supposed to be 1 row per ID? I ran @JOL's code but it gets 2 rows for ID 2 as that ID has 2 rows for discharge1. What is your requirement? Do you want to keep the 15 or the 13 for that ID or combine them to 28?
Data sample1;
   Set sample;
   By id;
   If first.id then rn = 1; 
   Else rn + 1;
Run;
Proc SQL;
	Create table sample2 as
	Select ID, Entry, Discharge, Event, los, rn,
		Case 
			When Event = 'discharge1' then 'Stop'
        	When Event = 'death' then 'Stop'
        	When Discharge >=  "30sep2015"d then 'Stop'
        	Else '.' 
    	End as Include         
		From sample1;
Quit;
Proc SQL;
      Create table sample3 as
      Select ID, rn    
      From sample2
      Where Include = 'Stop';
Quit;
Proc SQL;
     Create table sample4 as
     Select sample2.ID, sum(sample2.los) as cum_los
     From sample2
     Left join sample3 on sample2.ID = sample3.ID
     Where sample2.rn <= sample3.rn or sample3.ID is null
     Group by sample2.ID;
Quit;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
