Hello,
I’m looking at patient discharge records, with the ultimate goal of calculating average length of stay. Where patients were readmitted within 30 days, I would like to combine these records into a single ‘care encounter,’ so I’m trying to calculate a total length of stay for these. Here is my syntax and results:
data sample;
input Px_ID admission:DATE9. discharge:DATE9.;
format admission DATE9.;
format discharge DATE9.;
datalines;
003 01Jan2020 05Jan2020
002 06Aug2020 23Sep2020
002 30Sep2020 15Oct2020
003 05Jun2019 30Sep2019
002 17Jan2019 31Jan2019
002 29Feb2020 11Mar2020
003 24Dec2019 27Dec2019
003 10Mar2020 17Mar2020
003 21Mar2020 28Mar2020
003 04Apr2020 16Apr2020
003 10May2020 14May2020
003 01Sep2020 03Sep2020
;
*Sort the raw file by Px ID then by admission and discharge dates, respectively;
proc sort data=sample;
by Px_ID admission discharge;
run;
*Calculate total LOS for Px with readmissions <= 30 days;
data final;
set sample;
by Px_ID admission discharge;
*Add a sequence for the admissions per Px for programming purposes (this is helpful in my 6M+ record database);
retain hosp_seq;
if first.Px_ID then hosp_seq = 0;
hosp_seq = hosp_seq+1;
*Calculate length of stay - adding 1 ensures that Day 1 of admission is counted;
los_calc = discharge - admission + 1;
*Calculate the number of days between Px admissions;
ref_date = LAG(discharge);
format ref_date DATE9.;
gap = admission - ref_date;
if first.Px_ID then do;
ref_date = .;
gap = .;
end;
*Calculate total LOS;
retain total_los;
if first.Px_ID then do;
total_los = .;
end;
if 0 <= gap <= 30 then total_los + los_calc;
else total_los = .;
run;
RESULTS:
Obs | Px_ID | admission | discharge | hosp_seq | los_calc | ref_date | gap | total_los |
1 | 2 | 17JAN2019 | 31JAN2019 | 1 | 15 | . | . | . |
2 | 2 | 29FEB2020 | 11MAR2020 | 2 | 12 | 31JAN2019 | 394 | . |
3 | 2 | 06AUG2020 | 23SEP2020 | 3 | 49 | 11MAR2020 | 148 | . |
4 | 2 | 30SEP2020 | 15OCT2020 | 4 | 16 | 23SEP2020 | 7 | 16 |
5 | 3 | 05JUN2019 | 30SEP2019 | 1 | 118 | . | . | . |
6 | 3 | 24DEC2019 | 27DEC2019 | 2 | 4 | 30SEP2019 | 85 | . |
7 | 3 | 01JAN2020 | 05JAN2020 | 3 | 5 | 27DEC2019 | 5 | 5 |
8 | 3 | 10MAR2020 | 17MAR2020 | 4 | 8 | 05JAN2020 | 65 | . |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 | 3 | 10MAY2020 | 14MAY2020 | 7 | 5 | 16APR2020 | 24 | 26 |
12 | 3 | 01SEP2020 | 03SEP2020 | 8 | 3 | 14MAY2020 | 110 | . |
The issue is that where patients have multiple, consecutive hospitalizations with gaps <=30 days, my method of calculating the total_los, which is cumulative, is problematic because it won’t give me an accurate average. What I need to do is to somehow tell the program that I only want to retain the total_los calculation for the most recent hospitalization where there are consecutive readmissions (see the emphasized line in the results table), then create a new variable where records with a gap of more than 30 days keep the 'los_calc' value, and any records included in a total_los calculation are marked as missing…which is more challenging than it sounds because these aren’t always the most recent hospitalizations (so using LAST. in a loop, for example, won’t work).
Any suggestions?
Not too sure what you want, but something like this should point you to the right direction.
Look at the next observation to discard the current one if applicable.
data FINAL;
merge SAMPLE
SAMPLE(obs=2 keep=PX_ID ADMISSION DISCHARGE rename=(PX_ID=ID2 ADMISSION=AD2 DISCHARGE=DI2));
format REF_DATE date9.;
REF_DATE = lag(DISCHARGE);
RE2 = lag(DI2);
if PX_ID ne lag(PX_ID) then do;
HOSP_SEQ = 0;
REF_DATE = 0;
GAP = 0;
TOTAL_LOS = 0;
end;
else do;
HOSP_SEQ + 1;
LOS_CALC = DISCHARGE - ADMISSION + 1;
GAP = ADMISSION - REF_DATE;
end;
GAP2= AD2 - RE2;
if 0 <= GAP <= 30 & 0 <= GAP2 <= 30 then return;
if 0 <= GAP <= 30 then TOTAL_LOS + LOS_CALC;
else TOTAL_LOS = 0;
run;
I was looking for length of study for 30-day readmission example, and found this question is a similar needs but with one additional requirement. Here is: for px_id=3 and admission date='04APR2020', it is a valid readmission with 7 days gap. However, it is within 30 days of 17MAR2020, meaning this is a second readmission within 30 days. My readmission calculation needs to count only the first readmission within 30 days.
Any help would be greatly appreciated!
Mitch
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.