Hi all,
I have a dataset with multiple dates of admission per person (client_id). In the example below of one client in my dataset, lines 1, 2, & 3 overlap with consecutive dates. Additionally, lines 5 &6 overlap. I'm looking to limit the dataset further to collapse the consecutive dates.
Obs | ID | admit | client_id | release | admitcount |
1 | 295815 | 13-Dec-22 | 4087673300 | 7-Jan-23 | 1 |
2 | 295936 | 7-Jan-23 | 4087673300 | 9-Jan-23 | 2 |
3 | 295950 | 9-Jan-23 | 4087673300 | 18-Jan-23 | 3 |
4 | 296219 | 21-Feb-23 | 4087673300 | 22-Feb-23 | 4 |
5 | 299130 | 26-Jun-23 | 4087673300 | 26-Jun-23 | 5 |
6 | 299132 | 26-Jun-23 | 4087673300 | 6-Jul-23 | 6 |
I tried using a do loop to sort through the consecutive dates and only keep the records that have actual admit dates.
proc sort data=morethanone; by client_id ID; run;
data want;
set morethanone;
by client_id ID;
lag_enddate=lag(release);
if not first.client_id then do;
if admit-lag_enddate=0 then delete_flag=1;
end;
if delete_flag then delete;
format lag_enddate date7.;
run;
ID | admit | client_id | release | admitcount | lag_enddate | delete_flag |
295815 | 13-Dec-22 | 4087673300 | 7-Jan-23 | 1 | 13-Feb-23 | . |
296219 | 21-Feb-23 | 4087673300 | 22-Feb-23 | 4 | 18-Jan-23 | . |
299130 | 26-Jun-23 | 4087673300 | 26-Jun-23 | 5 | 22-Feb-23 | . |
The program kept the records I wanted but I would like the lag_enddate to be the actual end date.
What I want:
ID | admit | client_id | release | admitcount | lag_enddate | delete_flag |
295815 | 13-Dec-22 | 4087673300 | 7-Jan-23 | 1 | 18-Jan-23 | . |
296219 | 21-Feb-23 | 4087673300 | 22-Feb-23 | 4 | 22-Feb-23 | . |
299130 | 26-Jun-23 | 4087673300 | 26-Jun-23 | 5 | 6-Jul-23 | . |
Please help! Tell me what I'm doing wrong.
Yes. Just make sure it is sorted.
proc sort data=have;
by client_id admit release;
run;
Result
Obs client_id admitcount start end n_admit days gap 1 4087673300 3 13DEC2022 18JAN2023 3 37 34 2 4087673300 4 21FEB2023 22FEB2023 1 2 124 3 4087673300 6 26JUN2023 06JUL2023 2 11 . 4 5265189100 3 13DEC2022 18JAN2023 3 37 34 5 5265189100 4 21FEB2023 22FEB2023 1 2 124 6 5265189100 6 26JUN2023 06JUL2023 2 11 . 7 8646854600 3 13DEC2022 18JAN2023 3 37 34 8 8646854600 4 21FEB2023 22FEB2023 1 2 124 9 8646854600 6 26JUN2023 06JUL2023 2 11 .
You don't need a DO group here. Judicious use of lag values and first.client_id inside an IFN function is is all you need to see the information you need.
data morethanone;
input
ID admit :date9. client_id release :date9. admitcount;
format admit release date9. ;
datalines;
295815 13-Dec-22 4087673300 7-Jan-23 1
295936 7-Jan-23 4087673300 9-Jan-23 2
295950 9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300 6-Jul-23 6
run;
data want ;
set morethanone;
by client_id id ;
lag_enddate=ifn(first.client_id,admit,lag(release));
if first.client_id=1 or admit-1 > lag_enddate;
format lag_enddate date9.;
run;
I presume for the first record for a given client_id, you want ADMIT as the lag_enddate value.
@mkeintz thanks for your help! Not quite what I was looking for. This particular client has been admitted to the facility for a true total of 3 times even though the data shows it as 6 admits. There are instances where he was released and admitted the same day, but it should not be treated as two separte admits, rather one string of days. Ultimately, I want to collapse/limit the data so I get those three date ranges below and do anaverage length of stay.
First admit: 13DEC22 - 18JAN23
Second admit: 21FEB23 - 22FEB23
Third admit: 26JUN23 - 06JUL23
If I did the average length of stay with the orginal 6 admits and date ranges, it wouldn't be accurate because I would have to divide the number of admits by 6 rather than 3.
@mkeintz Also, with the steps you provided, it did choose the right admit dates but the wrong lag_enddate.
@amandav2107 wrote:
@mkeintz Also, with the steps you provided, it did choose the right admit dates but the wrong lag_enddate.
I'd say naming a variable lag_enddate implies looking back, which is actually the opposite of what I now see you want. You want the last_enddate (which I use in the code below) in a group of records, in which all the other data come from the first record of a sequence - i.e. you want to look forward.
The also means every obs in the dataset has to be read twice, because you have to look forward to get last_enddate. Then in the second read, keep that last_enddate and output it with all the data from the first record.
To know when to begin the re-reading, you have to know when you've reached the end of a sequence (i.e. when the next record is from a new client_id or has an admit date more than one day after the release date in hand. That's what the merge statement below supports, in combination with the firstobs=2 dataset name parameter. Note the merge statement does NOT have the usual accompanying BY statement.
data morethanone;
input
ID admit :date9. client_id release :date9. admitcount;
format admit release date9. ;
datalines;
295815 13-Dec-22 4087673300 7-Jan-23 1
295936 7-Jan-23 4087673300 9-Jan-23 2
295950 9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300 6-Jul-23 6
run;
data want (drop=i j nxt_:);
do i=1 by 1 until (nxt_cid^=client_id or nxt_adm>release+1);
merge morethanone
morethanone (firstobs=2 keep=client_id admit rename=(client_id=nxt_cid admit=nxt_adm));
last_enddate=release;
end;
attrib last_enddate label='Final RELEASE date in this sequence' format=date9.;
do j=1 to i; /*Reread the sequence */
set morethanone;
if j=1 then output;
end;
run;
@mkeintz Okay, I've never used this logic before. That was really interesting to work through. What was your output? Because I ended up with one record with a last_enddate of 07JAN2023.
ID | admit | client_id | release | admitcount | last_enddate |
299132 | 26-Jun-23 | 4087673300 | 6-Jul-23 | 6 | 07-Jan-23 |
That was this client's last admit and and last release date. I am missing two more true admits.
It is probably easier to think about this as a LOOK AHEAD problem instead of a LOOK BACK problem.
So instead of the LAG of RELEASE you need to find the LEAD of ADMIT.
Here is one method using a second SET statement with dataset options to just pull in the next value of ADMIT and call it NEXT_ADMIT.
Now you just need to RETAIN the new starting date to do the comlapsing.
While we are at is let's make the minimum gap between release and the next admit that we want to collapse into a macro variable to make it easier.
First let's convert your listing into actual data.
data have;
input ID $ admit :date. client_id :$10. release :date. admitcount;
format admit release date9.;
cards;
295815 13-Dec-22 4087673300 7-Jan-23 1
295936 7-Jan-23 4087673300 9-Jan-23 2
295950 9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300 6-Jul-23 6
;
Now we can process it by CLIENT_ID in the order of ADMIT.
%let gap=0;
data want;
set have ;
by client_id admit ;
set have(firstobs=2 keep=admit rename=(admit=next_admit)) have(obs=1 drop=_all_);
length start end n_admit days gap 8;
retain start;
if first.client_id then do;
start=admit;
n_admit=0;
end;
if last.client_id then next_admit = release + &gap +1;
n_admit +1;
gap =next_admit-release ;
if gap > &gap then do;
end = release;
days = release - start +1;
if last.client_id then gap=.;
output;
start=admit;
n_admit=0;
end;
format start end date9.;
drop id admit release next_admit;
run;
Result
Obs client_id admitcount start end n_admit days gap 1 4087673300 3 13DEC2022 18JAN2023 3 37 34 2 4087673300 4 09JAN2023 22FEB2023 1 45 124 3 4087673300 6 21FEB2023 06JUL2023 2 136 .
Hey @Tom ! Interesting logic, however with the start and end dates are wrong. With the results you ended up with, my days will be off. The program choose the right 1st start and 1st end but should have been n_admit 1. For the 2nd start/n_admit it chose the wrong end. It should have been 22FEB2023. As for n_admit 1, the start should have been 26JUN2023 and the end should have been 06JUL2023.
n_admit 2 and 1 overlap.
Obs client_id admitcount start end n_admit days gap
1 4087673300 3 13DEC2022 18JAN2023 3 37 34
2 4087673300 4 09JAN2023 22FEB2023 1 45 124
3 4087673300 6 21FEB2023 06JUL2023 2 136 .
Here are the dates I want to ultimately keep to do the count of days inbetween. With this result, I'd have an accurate count.
First admit: 13DEC2022 - 18JAN2023
Second admit: 21FEB2023 - 22FEB2023
Third admit: 26JUN2023 - 06JUL2023
Set the START to missing after detecting a gap.
Test for missing(START) to set a new START.
data have;
input ID $ admit :date. client_id :$10. release :date. admitcount;
format admit release date9.;
cards;
295815 13-Dec-22 4087673300 7-Jan-23 1
295936 7-Jan-23 4087673300 9-Jan-23 2
295950 9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300 6-Jul-23 6
296219 21-Feb-23 999 22-Feb-23 1
299130 26-Jun-23 999 26-Jun-23 2
299132 26-Jun-23 999 6-Jul-23 3
;
%let gap=0;
data want;
set have ;
by client_id admit ;
* Add NEXT_ADMIT variable ;
set have(firstobs=2 keep=admit rename=(admit=next_admit)) have(obs=1 drop=_all_);
if last.client_id then next_admit = release + &gap +1;
length start end n_admit days gap 8;
start=coalesce(start,admit);
n_admit +1;
retain start;
gap=next_admit-release ;
if gap > &gap then do;
end = release;
days = release - start +1;
if last.client_id then gap=.;
output;
start=.;
n_admit=0;
end;
format start end date9.;
drop id admit release next_admit;
run;
Obs client_id admitcount start end n_admit days gap 1 4087673300 3 13DEC2022 18JAN2023 3 37 34 2 4087673300 4 21FEB2023 22FEB2023 1 2 124 3 4087673300 6 26JUN2023 06JUL2023 2 11 . 4 999 1 21FEB2023 22FEB2023 1 2 124 5 999 3 26JUN2023 06JUL2023 2 11 .
That worked really well! Thank you for that. Can I utilize this same program with a dataset that has more than one client_id (hundreds)?
Example below only has 3 cleints (but 2 more than the first example): 4087673300, 5265189100, & 8646854600.
data have;
input ID $ admit :date. client_id :$10. release :date. admitcount;
format admit release date9.;
cards;
295815 13-Dec-22 8646854600 7-Jan-23 1
295936 7-Jan-23 8646854600 9-Jan-23 2
295950 9-Jan-23 8646854600 18-Jan-23 3
296219 21-Feb-23 8646854600 22-Feb-23 4
299130 26-Jun-23 8646854600 26-Jun-23 5
299132 26-Jun-23 8646854600 6-Jul-23 6
295815 13-Dec-22 5265189100 7-Jan-23 1
295936 7-Jan-23 5265189100 9-Jan-23 2
295950 9-Jan-23 5265189100 18-Jan-23 3
296219 21-Feb-23 5265189100 22-Feb-23 4
299130 26-Jun-23 5265189100 26-Jun-23 5
299132 26-Jun-23 5265189100 6-Jul-23 6
295815 13-Dec-22 4087673300 7-Jan-23 1
295936 7-Jan-23 4087673300 9-Jan-23 2
295950 9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300 6-Jul-23 6
;
Yes. Just make sure it is sorted.
proc sort data=have;
by client_id admit release;
run;
Result
Obs client_id admitcount start end n_admit days gap 1 4087673300 3 13DEC2022 18JAN2023 3 37 34 2 4087673300 4 21FEB2023 22FEB2023 1 2 124 3 4087673300 6 26JUN2023 06JUL2023 2 11 . 4 5265189100 3 13DEC2022 18JAN2023 3 37 34 5 5265189100 4 21FEB2023 22FEB2023 1 2 124 6 5265189100 6 26JUN2023 06JUL2023 2 11 . 7 8646854600 3 13DEC2022 18JAN2023 3 37 34 8 8646854600 4 21FEB2023 22FEB2023 1 2 124 9 8646854600 6 26JUN2023 06JUL2023 2 11 .
Thank you, Tom! I really appreciate the help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.