Solved: Collapsing inpatient admissions

mikemangini · Posted 01-23-2017 12:45 PM

I have a tricky problem collapsing inpatient hospital admissions and haven't had success getting a solution from previous blog posts. What I would like to do, is collapse rows where the ADMISSION_START_DT = the ADMISSION_END_DT of the preceeding row. The idea is that this is really one admission. In this event I would like to keep the APR_MDC_DESC and HOSPITAL_ID from the more current, second row.

In the event that the ADMISSION_START_DT is < 15 days from the ADMISSION_END_DT from the preceeding row, I would like to populate a new variable, READMIT_FLAG_15_DAY so that we can track readmissions.

I have some sample data below. Any help would be greatly appreciated.

Have

PATIENT_ID	ADMISSION_START_DT	ADMISSION_END_DT	APR_MDC_DESC	SERVICE_CAT	HOSPITAL_ID
ABC	24-Nov-15	26-Nov-15	Diabetes	MED	123
ABC	26-Nov-15	3-Dec-15	Surgery	SURG	456
ABC	16-Jan-16	22-Jan-16	Hypertension	MED	123
ABC	25-Feb-16	29-Feb-16	Diabetes	MED	123
ABC	29-Feb-16	29-Feb-16	Observation	MED	123
ABC	2-Mar-16	2-Mar-16	Hypertension	MED	123
ABC	19-Mar-16	23-Mar-16	Observation	MED	123

Want

PATIENT_ID	ADMISSION_START_DT	ADMISSION_END_DT	APR_MDC_DESC	SERVICE_CAT	HOSPITAL_ID	READMIT_FLAG_15_DAY
ABC	24-Nov-15	3-Dec-15	Surgery	MED	456
ABC	16-Jan-16	22-Jan-16	Hypertension	MED	123
ABC	25-Feb-16	29-Feb-16	Observation	MED	123
ABC	2-Mar-16	2-Mar-16	Hypertension	MED	123
ABC	19-Mar-16	23-Mar-16	Observation	MED	123	Y

mkeintz · Posted 01-23-2017 01:51 PM

Why does the 5th record of your desired output have the flag='Y'? Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation. I believe you want the flag='Y' for the 4th record, yes? (2-feb-16 vs 2-mar-16). If so then this works:


data want (drop=next_stdate temp_:);

  do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
    set have (keep=patient_id);
    by patient_id;

    merge have
          have(firstobs=2 keep=admission_start_dt 
          rename=(admission_start_dt=next_stdate));
    if recnum=1 then do;
      temp_date=admission_start_dt;
      temp_cat=service_cat;
    end;
  end;

  if lag(patient_id)=patient_id 
     and admission_start_dt-15 < lag(admission_end_dt)
     then readmit_flag_15_day='Y';

  admission_start_dt=temp_date;
  service_cat=temp_cat;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ErikLund_Jensen · Posted 01-23-2017 01:26 PM

Hi

This is easily solved with the LAG function. Note that this code puts the Y in the second-last row, where I suppose it ought to be according to your spec, and not in the last.

* Just to get your data into SAS - ;

data have;

input PATIENT_ID $3. @5 ADMISSION_START_DT anydtdte9. @15 ADMISSION_END_DT anydtdte9.

@25 APR_MDC_DESC $12. @38 SERVICE_CAT $4. @43 HOSPITAL_ID 8.;

format ADMISSION_START_DT ADMISSION_END_DT ddmmyy10.;

cards;

ABC 24-Nov-15 26-Nov-15 Diabetes MED 123

ABC 26-Nov-15 3-Dec-15 Surgery SURG 456

ABC 16-Jan-16 22-Jan-16 Hypertension MED 123

ABC 25-Feb-16 29-Feb-16 Diabetes MED 123

ABC 29-Feb-16 29-Feb-16 Observation MED 123

ABC 2-Mar-16 2-Mar-16 Hypertension MED 123

ABC 19-Mar-16 23-Mar-16 Observation MED 123;

run;

* Sort in ascending date order pr patient - just to make sure..;

proc sort data=have; by PATIENT_ID ADMISSION_START_DT;

run;

* And here is where the LAG function comes in handy..;

data want (drop=Old_End); set have; by PATIENT_ID;

Old_End = lag(ADMISSION_END_DT);

if not first.PATIENT_ID then do;

if ADMISSION_START_DT = Old_End then delete;

else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';

end;

run;

	PATIENT_ID	ADMISSION_START_DT	ADMISSION_END_DT	APR_MDC_DESC	SERVICE_CAT	HOSPITAL_ID	READMIT_FLAG_15_DAY
1	ABC	24/11/2015	26/11/2015	Diabetes	MED	123
2	ABC	16/01/2016	22/01/2016	Hypertension	MED	123
3	ABC	25/02/2016	29/02/2016	Diabetes	MED	123
4	ABC	02/03/2016	02/03/2016	Hypertension	MED	123	Y
5	ABC	19/03/2016	23/03/2016	Observation	MED	123

mkeintz · Posted 01-23-2017 02:00 PM

@ErikLund_Jensen

Your program deletes the latter record and keeps the earlier record. I believe the OP wants the opposite. I don't think lag alone is sufficient. You need a way to look ahead and delete the prior record when it is known that the subsequent record has the desired data.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ErikLund_Jensen · Posted 01-23-2017 03:07 PM

You are right. I missed that one.

mkeintz · Posted 01-23-2017 01:51 PM

Why does the 5th record of your desired output have the flag='Y'? Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation. I believe you want the flag='Y' for the 4th record, yes? (2-feb-16 vs 2-mar-16). If so then this works:


data want (drop=next_stdate temp_:);

  do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
    set have (keep=patient_id);
    by patient_id;

    merge have
          have(firstobs=2 keep=admission_start_dt 
          rename=(admission_start_dt=next_stdate));
    if recnum=1 then do;
      temp_date=admission_start_dt;
      temp_cat=service_cat;
    end;
  end;

  if lag(patient_id)=patient_id 
     and admission_start_dt-15 < lag(admission_end_dt)
     then readmit_flag_15_day='Y';

  admission_start_dt=temp_date;
  service_cat=temp_cat;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mikemangini · Posted 01-23-2017 03:17 PM

Thank you for assisting with my question. I am still testing but your solution seems to be working great! You are correct on the 'Y' flag being put on the wrong record. As I was simplifying and deidentifying the records to create an example, I let that slip. Thanks again for your assistance!

ErikLund_Jensen · Posted 01-23-2017 03:16 PM

@mkeintz is right. It requires look-ahead. But I fear that using a merge inside a loop in the data step would not be very efficient with large data sets. I suggest doing it in a previous step instead:

proc sql;

create table inter as

select

a.PATIENT_ID,

a.ADMISSION_START_DT,

a.ADMISSION_END_DT,

coalesce(b.APR_MDC_DESC, a.APR_MDC_DESC) as APR_MDC_DESC,

a.SERVICE_CAT,

coalesce(b.HOSPITAL_ID, a.HOSPITAL_ID) as HOSPITAL_ID

from have as a left join have as b

on a.PATIENT_ID = b.PATIENT_ID

and a.ADMISSION_END_DT = b.ADMISSION_START_DT

order by a.PATIENT_ID, a.ADMISSION_START_DT;

quit;

data want2 (drop=Old_End); set inter; by PATIENT_ID;

Old_End = lag(ADMISSION_END_DT);

if not first.PATIENT_ID then do;

if ADMISSION_START_DT = Old_End then delete;

else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';

end;

run;

mkeintz · Posted 01-23-2017 05:21 PM

@ErikLund_Jensen

Why would a self-merge inside a loop be inefficient? The two data streams (3 if you count the SET statement) are synchronzied, taking advantage of the data set order. The same disk i/o buffer would be used for all data streams, reducing disk input/output operations. Once a small group of records is completely processed, they are flushed from memory. This all makes the operating system very happy. It's effectively one pass of the data, requires no creation of an intermediate data set, and doesn't use much memory.

The alternative you propose has two unsynchronized reads of data set have in the proc sql. And SQL (apparently) does not know the data set order, so it has to hold all the keys (and data) of one side of the join in memory (or in utility files on disk) in order to have all those keys available for matching against every record in the other side of the join. It also writes and rereads an intermediate data set on the way to data set WANT.

I admit, when data set have is sorted in a useful way, I am NOT a fan of sql in these situations.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ErikLund_Jensen · Posted 01-29-2017 06:33 AM

@mkeintz

I humbly admit you are right. I tried both methods on a large unsorted data set with 1000000 Patient-ID's, and your approach is about 1.8 times faster on an empty machine and almost 4 times faster on a heavy loaded machine. Thanks for the lesson!.

Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Re: Collapsing inpatient admissions

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away