I have a tricky problem collapsing inpatient hospital admissions and haven't had success getting a solution from previous blog posts. What I would like to do, is collapse rows where the ADMISSION_START_DT = the ADMISSION_END_DT of the preceeding row. The idea is that this is really one admission. In this event I would like to keep the APR_MDC_DESC and HOSPITAL_ID from the more current, second row.
In the event that the ADMISSION_START_DT is < 15 days from the ADMISSION_END_DT from the preceeding row, I would like to populate a new variable, READMIT_FLAG_15_DAY so that we can track readmissions.
I have some sample data below. Any help would be greatly appreciated.
Have
PATIENT_ID | ADMISSION_START_DT | ADMISSION_END_DT | APR_MDC_DESC | SERVICE_CAT | HOSPITAL_ID |
ABC | 24-Nov-15 | 26-Nov-15 | Diabetes | MED | 123 |
ABC | 26-Nov-15 | 3-Dec-15 | Surgery | SURG | 456 |
ABC | 16-Jan-16 | 22-Jan-16 | Hypertension | MED | 123 |
ABC | 25-Feb-16 | 29-Feb-16 | Diabetes | MED | 123 |
ABC | 29-Feb-16 | 29-Feb-16 | Observation | MED | 123 |
ABC | 2-Mar-16 | 2-Mar-16 | Hypertension | MED | 123 |
ABC | 19-Mar-16 | 23-Mar-16 | Observation | MED | 123 |
Want
PATIENT_ID | ADMISSION_START_DT | ADMISSION_END_DT | APR_MDC_DESC | SERVICE_CAT | HOSPITAL_ID | READMIT_FLAG_15_DAY |
ABC | 24-Nov-15 | 3-Dec-15 | Surgery | MED | 456 | |
ABC | 16-Jan-16 | 22-Jan-16 | Hypertension | MED | 123 | |
ABC | 25-Feb-16 | 29-Feb-16 | Observation | MED | 123 | |
ABC | 2-Mar-16 | 2-Mar-16 | Hypertension | MED | 123 | |
ABC | 19-Mar-16 | 23-Mar-16 | Observation | MED | 123 | Y |
Why does the 5th record of your desired output have the flag='Y'? Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation. I believe you want the flag='Y' for the 4th record, yes? (2-feb-16 vs 2-mar-16). If so then this works:
data want (drop=next_stdate temp_:);
do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
set have (keep=patient_id);
by patient_id;
merge have
have(firstobs=2 keep=admission_start_dt
rename=(admission_start_dt=next_stdate));
if recnum=1 then do;
temp_date=admission_start_dt;
temp_cat=service_cat;
end;
end;
if lag(patient_id)=patient_id
and admission_start_dt-15 < lag(admission_end_dt)
then readmit_flag_15_day='Y';
admission_start_dt=temp_date;
service_cat=temp_cat;
run;
Hi
This is easily solved with the LAG function. Note that this code puts the Y in the second-last row, where I suppose it ought to be according to your spec, and not in the last.
* Just to get your data into SAS - ;
data have;
input PATIENT_ID $3. @5 ADMISSION_START_DT anydtdte9. @15 ADMISSION_END_DT anydtdte9.
@25 APR_MDC_DESC $12. @38 SERVICE_CAT $4. @43 HOSPITAL_ID 8.;
format ADMISSION_START_DT ADMISSION_END_DT ddmmyy10.;
cards;
ABC 24-Nov-15 26-Nov-15 Diabetes MED 123
ABC 26-Nov-15 3-Dec-15 Surgery SURG 456
ABC 16-Jan-16 22-Jan-16 Hypertension MED 123
ABC 25-Feb-16 29-Feb-16 Diabetes MED 123
ABC 29-Feb-16 29-Feb-16 Observation MED 123
ABC 2-Mar-16 2-Mar-16 Hypertension MED 123
ABC 19-Mar-16 23-Mar-16 Observation MED 123;
run;
* Sort in ascending date order pr patient - just to make sure..;
proc sort data=have; by PATIENT_ID ADMISSION_START_DT;
run;
* And here is where the LAG function comes in handy..;
data want (drop=Old_End); set have; by PATIENT_ID;
Old_End = lag(ADMISSION_END_DT);
if not first.PATIENT_ID then do;
if ADMISSION_START_DT = Old_End then delete;
else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';
end;
run;
PATIENT_ID | ADMISSION_START_DT | ADMISSION_END_DT | APR_MDC_DESC | SERVICE_CAT | HOSPITAL_ID | READMIT_FLAG_15_DAY | |
---|---|---|---|---|---|---|---|
1 | ABC | 24/11/2015 | 26/11/2015 | Diabetes | MED | 123 | |
2 | ABC | 16/01/2016 | 22/01/2016 | Hypertension | MED | 123 | |
3 | ABC | 25/02/2016 | 29/02/2016 | Diabetes | MED | 123 | |
4 | ABC | 02/03/2016 | 02/03/2016 | Hypertension | MED | 123 | Y |
5 | ABC | 19/03/2016 | 23/03/2016 | Observation | MED | 123 |
Your program deletes the latter record and keeps the earlier record. I believe the OP wants the opposite. I don't think lag alone is sufficient. You need a way to look ahead and delete the prior record when it is known that the subsequent record has the desired data.
Why does the 5th record of your desired output have the flag='Y'? Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation. I believe you want the flag='Y' for the 4th record, yes? (2-feb-16 vs 2-mar-16). If so then this works:
data want (drop=next_stdate temp_:);
do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
set have (keep=patient_id);
by patient_id;
merge have
have(firstobs=2 keep=admission_start_dt
rename=(admission_start_dt=next_stdate));
if recnum=1 then do;
temp_date=admission_start_dt;
temp_cat=service_cat;
end;
end;
if lag(patient_id)=patient_id
and admission_start_dt-15 < lag(admission_end_dt)
then readmit_flag_15_day='Y';
admission_start_dt=temp_date;
service_cat=temp_cat;
run;
Thank you for assisting with my question. I am still testing but your solution seems to be working great! You are correct on the 'Y' flag being put on the wrong record. As I was simplifying and deidentifying the records to create an example, I let that slip. Thanks again for your assistance!
@mkeintz is right. It requires look-ahead. But I fear that using a merge inside a loop in the data step would not be very efficient with large data sets. I suggest doing it in a previous step instead:
proc sql;
create table inter as
select
a.PATIENT_ID,
a.ADMISSION_START_DT,
a.ADMISSION_END_DT,
coalesce(b.APR_MDC_DESC, a.APR_MDC_DESC) as APR_MDC_DESC,
a.SERVICE_CAT,
coalesce(b.HOSPITAL_ID, a.HOSPITAL_ID) as HOSPITAL_ID
from have as a left join have as b
on a.PATIENT_ID = b.PATIENT_ID
and a.ADMISSION_END_DT = b.ADMISSION_START_DT
order by a.PATIENT_ID, a.ADMISSION_START_DT;
quit;
data want2 (drop=Old_End); set inter; by PATIENT_ID;
Old_End = lag(ADMISSION_END_DT);
if not first.PATIENT_ID then do;
if ADMISSION_START_DT = Old_End then delete;
else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';
end;
run;
Why would a self-merge inside a loop be inefficient? The two data streams (3 if you count the SET statement) are synchronzied, taking advantage of the data set order. The same disk i/o buffer would be used for all data streams, reducing disk input/output operations. Once a small group of records is completely processed, they are flushed from memory. This all makes the operating system very happy. It's effectively one pass of the data, requires no creation of an intermediate data set, and doesn't use much memory.
The alternative you propose has two unsynchronized reads of data set have in the proc sql. And SQL (apparently) does not know the data set order, so it has to hold all the keys (and data) of one side of the join in memory (or in utility files on disk) in order to have all those keys available for matching against every record in the other side of the join. It also writes and rereads an intermediate data set on the way to data set WANT.
I admit, when data set have is sorted in a useful way, I am NOT a fan of sql in these situations.
I humbly admit you are right. I tried both methods on a large unsorted data set with 1000000 Patient-ID's, and your approach is about 1.8 times faster on an empty machine and almost 4 times faster on a heavy loaded machine. Thanks for the lesson!.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.