## DATA Step, Macro, Functions and more

Solved
Contributor
Posts: 36

I have a tricky problem collapsing inpatient hospital admissions and haven't had success getting a solution from previous blog posts. What I would like to do, is collapse rows where the ADMISSION_START_DT = the ADMISSION_END_DT of the preceeding row. The idea is that this is really one admission. In this event I would like to keep the APR_MDC_DESC and HOSPITAL_ID from the more current, second row.

In the event that the ADMISSION_START_DT is < 15 days from the ADMISSION_END_DT from the preceeding row, I would like to populate a new variable, READMIT_FLAG_15_DAY so that we can track readmissions.

I have some sample data below. Any help would be greatly appreciated.

Have

 PATIENT_ID ADMISSION_START_DT ADMISSION_END_DT APR_MDC_DESC SERVICE_CAT HOSPITAL_ID ABC 24-Nov-15 26-Nov-15 Diabetes MED 123 ABC 26-Nov-15 3-Dec-15 Surgery SURG 456 ABC 16-Jan-16 22-Jan-16 Hypertension MED 123 ABC 25-Feb-16 29-Feb-16 Diabetes MED 123 ABC 29-Feb-16 29-Feb-16 Observation MED 123 ABC 2-Mar-16 2-Mar-16 Hypertension MED 123 ABC 19-Mar-16 23-Mar-16 Observation MED 123

Want

 PATIENT_ID ADMISSION_START_DT ADMISSION_END_DT APR_MDC_DESC SERVICE_CAT HOSPITAL_ID READMIT_FLAG_15_DAY ABC 24-Nov-15 3-Dec-15 Surgery MED 456 ABC 16-Jan-16 22-Jan-16 Hypertension MED 123 ABC 25-Feb-16 29-Feb-16 Observation MED 123 ABC 2-Mar-16 2-Mar-16 Hypertension MED 123 ABC 19-Mar-16 23-Mar-16 Observation MED 123 Y

Accepted Solutions
Solution
‎01-23-2017 03:20 PM
Posts: 1,337

Why does the 5th record of your desired output have the flag='Y'?  Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation.  I believe you want the flag='Y' for the 4th record, yes?  (2-feb-16 vs 2-mar-16).  If so then this works:

``````
data want (drop=next_stdate temp_:);

do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
set have (keep=patient_id);
by patient_id;

merge have
if recnum=1 then do;
temp_cat=service_cat;
end;
end;

if lag(patient_id)=patient_id

service_cat=temp_cat;
run;
``````

All Replies
Contributor
Posts: 47

Hi

This is easily solved with the LAG function. Note that this code puts the Y in the second-last row, where I suppose it ought to be according to your spec, and not in the last.

* Just to get your data into SAS - ;

data have;

@25 APR_MDC_DESC \$12. @38 SERVICE_CAT \$4. @43 HOSPITAL_ID 8.;

cards;

ABC 24-Nov-15 26-Nov-15 Diabetes     MED  123

ABC 26-Nov-15 3-Dec-15  Surgery      SURG 456

ABC 16-Jan-16 22-Jan-16 Hypertension MED  123

ABC 25-Feb-16 29-Feb-16 Diabetes     MED  123

ABC 29-Feb-16 29-Feb-16 Observation  MED  123

ABC 2-Mar-16  2-Mar-16  Hypertension MED  123

ABC 19-Mar-16 23-Mar-16 Observation  MED  123;

run;

* Sort in ascending date order pr patient - just to make sure..;

proc sort data=have; by PATIENT_ID ADMISSION_START_DT;

run;

* And here is where the LAG function comes in handy..;

data want (drop=Old_End); set have; by PATIENT_ID;

if not first.PATIENT_ID then do;

if ADMISSION_START_DT = Old_End then delete;

end;

run;

1 ABC 24/11/2015 26/11/2015 Diabetes MED 123
2 ABC 16/01/2016 22/01/2016 Hypertension MED 123
3 ABC 25/02/2016 29/02/2016 Diabetes MED 123
4 ABC 02/03/2016 02/03/2016 Hypertension MED 123 Y
5 ABC 19/03/2016 23/03/2016 Observation MED 123

Posts: 1,337

@ErikLund_Jensen

Your program deletes the latter record and keeps the earlier record.  I believe the OP wants the opposite.  I don't think lag alone is sufficient.  You need a way to look ahead and delete the prior record when it is known that the subsequent record has the desired data.

Contributor
Posts: 47

You are right. I missed that one.
Solution
‎01-23-2017 03:20 PM
Posts: 1,337

Why does the 5th record of your desired output have the flag='Y'?  Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation.  I believe you want the flag='Y' for the 4th record, yes?  (2-feb-16 vs 2-mar-16).  If so then this works:

``````
data want (drop=next_stdate temp_:);

do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
set have (keep=patient_id);
by patient_id;

merge have
if recnum=1 then do;
temp_cat=service_cat;
end;
end;

if lag(patient_id)=patient_id

service_cat=temp_cat;
run;
``````

Contributor
Posts: 36

Thank you for assisting with my question. I am still testing but your solution seems to be working great! You are correct on the 'Y' flag being put on the wrong record. As I was simplifying and deidentifying the records to create an example, I let that slip. Thanks again for your assistance!

Contributor
Posts: 47

@mkeintz is right. It requires look-ahead. But I fear that using a merge inside a loop in the data step would not be very efficient with large data sets. I suggest doing it in a previous step instead:

proc sql;

create table inter as

select

a.PATIENT_ID,

coalesce(b.APR_MDC_DESC, a.APR_MDC_DESC) as APR_MDC_DESC,

a.SERVICE_CAT,

coalesce(b.HOSPITAL_ID, a.HOSPITAL_ID) as HOSPITAL_ID

from have as a left join have as b

on  a.PATIENT_ID = b.PATIENT_ID

quit;

data want2 (drop=Old_End); set inter; by PATIENT_ID;

if not first.PATIENT_ID then do;

if ADMISSION_START_DT = Old_End then delete;

end;

run;

Posts: 1,337

@ErikLund_Jensen

Why would a self-merge inside a loop be inefficient?  The two data streams (3 if you count the SET statement) are synchronzied, taking advantage of the data set order.  The same disk i/o buffer would be used for all data streams, reducing disk input/output operations.  Once a small group of records is completely processed, they are flushed from memory.  This all makes the operating system very happy.   It's effectively one pass of the data, requires no creation of an intermediate data set, and doesn't use much memory.

The alternative you propose has two unsynchronized reads of data set have in the proc sql.  And SQL (apparently) does not know the data set order, so it has to hold all the keys (and data) of one side of the join in memory (or in utility files on disk) in order to have all those keys available for matching against every record in the other side of the join.  It also writes and rereads an intermediate data set on the way to data set WANT.

I admit, when data set have is sorted in a useful way, I am NOT a fan of sql in these situations.

Contributor
Posts: 47

@mkeintz

I humbly admit you are right. I tried both methods on a large unsorted data set with 1000000 Patient-ID's, and your approach is about 1.8 times faster on an empty machine and almost 4 times faster on a heavy loaded machine. Thanks for the lesson!.

☑ This topic is solved.