Hello all,
I have a question regarding counting days of overall treatment stays by client ID. What I would like to know is for each client what was the total length of treatment they had allowing for a 20 day gap between treatment.
For example in the data below for client 1 allowing for a 20 day gap between the first end_date and the next start date the client has 169, so I would want a column that would indicate for each row 169 days. For client 3 the difference between the end date of the first record to the next record is 33 days (more than the allowable 20 days) so the first record would be marked as 33 days, but the next three records that gaps are less than 20 days so each row would have the total 155 days labeled.
I have a sample of the data I have and what the final output I would like :
data have; informat ID $1. start_date mmddyy10. end_date mmddyy10.; input ID start_date end_date ; format start_date end_date mmddyy10.; datalines; 1 01/01/2018 01/10/2018 1 01/11/2018 01/31/2018 1 02/05/2018 02/15/2018 1 02/19/2018 05/30/2018 1 06/15/2018 06/30/2018 1 07/05/2018 07/20/2018 2 01/05/2018 01/06/2018 3 02/05/2018 02/15/2018 3 03/20/2018 04/15/2018 3 04/20/2018 07/31/2018 3 08/14/2018 09/10/2018 ; run;
diff_date is the difference between end_date and start_date on the same row.
diff_btwn_date is the difference between the end_date of the first row to the start_date of the next row by Patient ID
Total_stay is the column I would like to get to.
Patient ID | Start_date | End_date | diff_date | diff_btwn_date | Total_stay |
1 | 1/1/2018 | 1/10/2018 | 9 | 1 | 169 |
1 | 1/11/2018 | 1/31/2018 | 20 | 5 | 169 |
1 | 2/5/2018 | 2/15/2018 | 10 | 4 | 169 |
1 | 2/19/2018 | 5/30/2018 | 100 | 16 | 169 |
1 | 6/15/2018 | 6/30/2018 | 15 | 5 | 169 |
1 | 7/5/2018 | 7/20/2018 | 15 | 169 | |
2 | 1/5/2018 | 1/6/2018 | 1 | 1 | |
3 | 2/5/2018 | 2/15/2018 | 10 | 33 | 33 |
3 | 3/20/2018 | 4/15/2018 | 26 | 5 | 155 |
3 | 4/20/2018 | 7/31/2018 | 102 | 14 | 155 |
3 | 8/14/2018 | 9/10/2018 | 27 | 155 |
Thanks in advance
@andreas_lds and @ChrisNZ Thank you to you both! I used a combination of what you both had suggested to get what I needed.
This is what I got and it worked for what I needed!
Thank you again
data diff; merge have HAVE(firstobs=2 keep=ID START_DATE end_date rename=(ID=nID START_DATE=nSTART_DATE end_date=nend_date)); diff_date=end_date-start_date; if ID=NID then do; diff_btwn_date=nSTART_DATE-end_date; end; run; proc sql; Select a.*, b. total_stay From diff a join (Select distinct ID, Sum(diff_date) as total_stay From diff Where diff_btwn_date<=20 Group by ID) b on a.id=b.id and a.diff_btwn_date<=20 Union Select a.*, diff_date as total_stay From diff a Where a.diff_btwn_date>20 group By id, start_date; quit;
I don't understand how you calculate the stay length. Your numbers look wrong to me.
Here are my calculations.
Just merge back to propagate the duration to all records if needed.
data DIF;
merge HAVE
HAVE(firstobs=2 keep=ID START_DATE rename=(ID=NEXT_ID START_DATE=NEXT_DATE));
retain START; format START date9.;
if ID ne lag(ID) then START=START_DATE;
if ID = NEXT_ID then do;
DIF_DATE = END_DATE - START_DATE ;
DIFF_NEXT = NEXT_DATE - END_DATE ;
end;
if DIFF_NEXT > 20 or ID ne NEXT_ID then do;
STAY=END_DATE-START;
START=NEXT_DATE;
end;
run;
ID | start_date | end_date | NEXT_ID | NEXT_DATE | START | DIF_DATE | DIFF_NEXT | STAY |
---|---|---|---|---|---|---|---|---|
1 | 01JAN2018 | 10JAN2018 | 1 | 11JAN2018 | 01JAN2018 | 9 | 1 | . |
1 | 11JAN2018 | 31JAN2018 | 1 | 05FEB2018 | 01JAN2018 | 20 | 5 | . |
1 | 05FEB2018 | 15FEB2018 | 1 | 19FEB2018 | 01JAN2018 | 10 | 4 | . |
1 | 19FEB2018 | 30MAY2018 | 1 | 15JUN2018 | 01JAN2018 | 100 | 16 | . |
1 | 15JUN2018 | 30JUN2018 | 1 | 05JUL2018 | 01JAN2018 | 15 | 5 | . |
1 | 05JUL2018 | 20JUL2018 | 2 | 05JAN2018 | 05JAN2018 | . | . | 200 |
2 | 05JAN2018 | 06JAN2018 | 3 | 05FEB2018 | 05FEB2018 | . | . | 1 |
3 | 05FEB2018 | 15FEB2018 | 3 | 20MAR2018 | 20MAR2018 | 10 | 33 | 10 |
3 | 20MAR2018 | 15APR2018 | 3 | 20APR2018 | 20MAR2018 | 26 | 5 | . |
3 | 20APR2018 | 31JUL2018 | 3 | 14AUG2018 | 20MAR2018 | 102 | 14 | . |
3 | 14AUG2018 | 10SEP2018 | . | . | . | . | 174 |
@ChrisNZ Thank you!
The Total_stay coloum is adding the diff_date coloumn so for client 1 169 is 9+20+10+100+15+15+15. For client 3 the 155 is 26+102+27 for client 3 the should have been 10 because of the gap between the first discharge (2/15/2018) and the next admit 3/20/18 is 33 days.
Are you sure that 155 is the right result for third patient?
data want;
if 0 then set work.have;
total_stay = 0;
do until (last.Id);
set work.have;
by Id;
if first.Id then do;
total_stay = end_date - start_date;
end;
else do;
if (start_date - lastEnd) <= 20 then do;
total_stay = total_stay + end_date - start_date;
end;
end;
lastEnd = end_date;
end;
do until (last.Id);
set work.have;
by Id;
output;
end;
drop lastEnd;
run;
Thanks @andreas_lds yes the last one is correct. We are counting the last three rows (26+102+27).
I think this one is going to work. Going to apply it to the bigger dataset to check.
@andreas_lds and @ChrisNZ Thank you to you both! I used a combination of what you both had suggested to get what I needed.
This is what I got and it worked for what I needed!
Thank you again
data diff; merge have HAVE(firstobs=2 keep=ID START_DATE end_date rename=(ID=nID START_DATE=nSTART_DATE end_date=nend_date)); diff_date=end_date-start_date; if ID=NID then do; diff_btwn_date=nSTART_DATE-end_date; end; run; proc sql; Select a.*, b. total_stay From diff a join (Select distinct ID, Sum(diff_date) as total_stay From diff Where diff_btwn_date<=20 Group by ID) b on a.id=b.id and a.diff_btwn_date<=20 Union Select a.*, diff_date as total_stay From diff a Where a.diff_btwn_date>20 group By id, start_date; quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.