I have the following data:
Data sample;
input id $ (entry) (:mmddyy10.) (discharge) (:mmddyy10.) event $ 10.;
format entry discharge date9.;
If discharge lt "30sep2015"d then los=discharge-entry;
Else los="30sep2015"d-entry;
datalines;
1 12/05/2014 12/17/2014 discharge2
1 12/24/2014 01/17/2015 discharge2
1 02/06/2015 03/10/2015 discharge1
2 08/01/2013 08/16/2013 discharge1
2 10/24/2013 11/06/2013 discharge1
3 02/10/2015 05/10/2015 discharge1
3 09/10/2015 10/18/2015 discharge2
4 09/23/2013 10/22/2013 discharge2
4 10/30/2013 07/15/2014 discharge2
4 07/25/2014 08/14/2014 death
5 08/01/2015 08/16/2015 discharge2
5 09/01/2015 11/06/2015 discharge1
6 11/05/2013 09/05/2014 discharge2
6 09/15/2014 07/08/2015 discharge2
6 07/14/2015 07/16/2015 discharge2
;
Run;
Each row represents a period of stay that starts with an entry date and a discharge date. The "los" variable represents the length of stay (i.e. the number of days from entry to discharge). The "event" variable describes the type of discharge (discharge1=return not anticipated, discharge2=return anticipated, death).
I would like to find the cumulative length of stay (los) by id, where counting stops if the event is discharge1 or death, whichever comes first, or the last of a series of discharge2 dates. So the dataset starts to look like this:
And ultimately, this is the dataset I am hoping to end up with:
Data sample1;
Set sample;
By id;
If first.id then rn = 1;
Else rn + 1;
Run;
Proc SQL;
Create table sample2 as
Select ID, Entry, Discharge, Event, los, rn,
Case
When Event = 'discharge1' then 'Stop'
When Event = 'death' then 'Stop'
When Discharge >= "30sep2015"d then 'Stop'
Else '.'
End as Include
From sample1;
Quit;
Proc SQL;
Create table sample3 as
Select ID, rn
From sample2
Where Include = 'Stop';
Quit;
Proc SQL;
Create table sample4 as
Select sample2.ID, sum(sample2.los) as cum_los
From sample2
Left join sample3 on sample2.ID = sample3.ID
Where sample2.rn <= sample3.rn or sample3.ID is null
Group by sample2.ID;
Quit;
You may need to modify the final subsetting IF.
data sample1;
set sample;
by id discharge;
if first.id then
cum_los = 0;
cum_los + los;
if event = "discharge1" or event = "death" or discharge = "20sep2015"d;
run;
@Doyinsola What are you doing with ID 2? Is the desired output supposed to be 1 row per ID? I ran @JOL's code but it gets 2 rows for ID 2 as that ID has 2 rows for discharge1. What is your requirement? Do you want to keep the 15 or the 13 for that ID or combine them to 28?
Data sample1;
Set sample;
By id;
If first.id then rn = 1;
Else rn + 1;
Run;
Proc SQL;
Create table sample2 as
Select ID, Entry, Discharge, Event, los, rn,
Case
When Event = 'discharge1' then 'Stop'
When Event = 'death' then 'Stop'
When Discharge >= "30sep2015"d then 'Stop'
Else '.'
End as Include
From sample1;
Quit;
Proc SQL;
Create table sample3 as
Select ID, rn
From sample2
Where Include = 'Stop';
Quit;
Proc SQL;
Create table sample4 as
Select sample2.ID, sum(sample2.los) as cum_los
From sample2
Left join sample3 on sample2.ID = sample3.ID
Where sample2.rn <= sample3.rn or sample3.ID is null
Group by sample2.ID;
Quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.