I have the following data:
Data sample;
input id $ (entry) (:mmddyy10.) (discharge) (:mmddyy10.) event $ 10.;
format entry discharge date9.;
If discharge lt "30sep2015"d then los=discharge-entry;
Else los="30sep2015"d-entry;
datalines;
1 12/05/2014 12/17/2014 discharge2
1 12/24/2014 01/17/2015 discharge2
1 02/06/2015 03/10/2015 discharge1
2 08/01/2013 08/16/2013 discharge1
2 10/24/2013 11/06/2013 discharge1
3 02/10/2015 05/10/2015 discharge1
3 09/10/2015 10/18/2015 discharge2
4 09/23/2013 10/22/2013 discharge2
4 10/30/2013 07/15/2014 discharge2
4 07/25/2014 08/14/2014 death
5 08/01/2015 08/16/2015 discharge2
5 09/01/2015 11/06/2015 discharge1
6 11/05/2013 09/05/2014 discharge2
6 09/15/2014 07/08/2015 discharge2
6 07/14/2015 07/16/2015 discharge2
;
Run;
Each row represents a period of stay that starts with an entry date and a discharge date. The "los" variable represents the length of stay (i.e. the number of days from entry to discharge). The "event" variable describes the type of discharge (discharge1=return not anticipated, discharge2=return anticipated, death).
I would like to find the cumulative length of stay (los) by id, where counting stops if the event is discharge1 or death, whichever comes first, or the last of a series of discharge2 dates. So the dataset starts to look like this:
And ultimately, this is the dataset I am hoping to end up with:
Data sample1;
Set sample;
By id;
If first.id then rn = 1;
Else rn + 1;
Run;
Proc SQL;
Create table sample2 as
Select ID, Entry, Discharge, Event, los, rn,
Case
When Event = 'discharge1' then 'Stop'
When Event = 'death' then 'Stop'
When Discharge >= "30sep2015"d then 'Stop'
Else '.'
End as Include
From sample1;
Quit;
Proc SQL;
Create table sample3 as
Select ID, rn
From sample2
Where Include = 'Stop';
Quit;
Proc SQL;
Create table sample4 as
Select sample2.ID, sum(sample2.los) as cum_los
From sample2
Left join sample3 on sample2.ID = sample3.ID
Where sample2.rn <= sample3.rn or sample3.ID is null
Group by sample2.ID;
Quit;
You may need to modify the final subsetting IF.
data sample1;
set sample;
by id discharge;
if first.id then
cum_los = 0;
cum_los + los;
if event = "discharge1" or event = "death" or discharge = "20sep2015"d;
run;
@Doyinsola What are you doing with ID 2? Is the desired output supposed to be 1 row per ID? I ran @JOL's code but it gets 2 rows for ID 2 as that ID has 2 rows for discharge1. What is your requirement? Do you want to keep the 15 or the 13 for that ID or combine them to 28?
Data sample1;
Set sample;
By id;
If first.id then rn = 1;
Else rn + 1;
Run;
Proc SQL;
Create table sample2 as
Select ID, Entry, Discharge, Event, los, rn,
Case
When Event = 'discharge1' then 'Stop'
When Event = 'death' then 'Stop'
When Discharge >= "30sep2015"d then 'Stop'
Else '.'
End as Include
From sample1;
Quit;
Proc SQL;
Create table sample3 as
Select ID, rn
From sample2
Where Include = 'Stop';
Quit;
Proc SQL;
Create table sample4 as
Select sample2.ID, sum(sample2.los) as cum_los
From sample2
Left join sample3 on sample2.ID = sample3.ID
Where sample2.rn <= sample3.rn or sample3.ID is null
Group by sample2.ID;
Quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.