BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mikemangini
Obsidian | Level 7

I have a tricky problem collapsing inpatient hospital admissions and haven't had success getting a solution from previous blog posts. What I would like to do, is collapse rows where the ADMISSION_START_DT = the ADMISSION_END_DT of the preceeding row. The idea is that this is really one admission. In this event I would like to keep the APR_MDC_DESC and HOSPITAL_ID from the more current, second row.

 

In the event that the ADMISSION_START_DT is < 15 days from the ADMISSION_END_DT from the preceeding row, I would like to populate a new variable, READMIT_FLAG_15_DAY so that we can track readmissions. 

 

I have some sample data below. Any help would be greatly appreciated.

 

 

Have

PATIENT_ID ADMISSION_START_DT ADMISSION_END_DT APR_MDC_DESC SERVICE_CAT HOSPITAL_ID
ABC 24-Nov-15 26-Nov-15 Diabetes MED 123
ABC 26-Nov-15 3-Dec-15 Surgery SURG 456
ABC 16-Jan-16 22-Jan-16 Hypertension  MED 123
ABC 25-Feb-16 29-Feb-16 Diabetes MED 123
ABC 29-Feb-16 29-Feb-16 Observation MED 123
ABC 2-Mar-16 2-Mar-16 Hypertension  MED 123
ABC 19-Mar-16 23-Mar-16 Observation MED 123

 

Want

PATIENT_ID ADMISSION_START_DT ADMISSION_END_DT APR_MDC_DESC SERVICE_CAT HOSPITAL_ID READMIT_FLAG_15_DAY
ABC 24-Nov-15 3-Dec-15 Surgery MED 456  
ABC 16-Jan-16 22-Jan-16 Hypertension  MED 123  
ABC 25-Feb-16 29-Feb-16 Observation MED 123  
ABC 2-Mar-16 2-Mar-16 Hypertension  MED 123  
ABC 19-Mar-16 23-Mar-16 Observation MED 123 Y
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Why does the 5th record of your desired output have the flag='Y'?  Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation.  I believe you want the flag='Y' for the 4th record, yes?  (2-feb-16 vs 2-mar-16).  If so then this works:

 


data want (drop=next_stdate temp_:);

  do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
    set have (keep=patient_id);
    by patient_id;

    merge have
          have(firstobs=2 keep=admission_start_dt 
          rename=(admission_start_dt=next_stdate));
    if recnum=1 then do;
      temp_date=admission_start_dt;
      temp_cat=service_cat;
    end;
  end;

  if lag(patient_id)=patient_id 
     and admission_start_dt-15 < lag(admission_end_dt)
     then readmit_flag_15_day='Y';

  admission_start_dt=temp_date;
  service_cat=temp_cat;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
ErikLund_Jensen
Rhodochrosite | Level 12

Hi 

 

This is easily solved with the LAG function. Note that this code puts the Y in the second-last row, where I suppose it ought to be according to your spec, and not in the last.

 

* Just to get your data into SAS - ;

data have;

    input PATIENT_ID $3. @5 ADMISSION_START_DT anydtdte9. @15 ADMISSION_END_DT anydtdte9.

        @25 APR_MDC_DESC $12. @38 SERVICE_CAT $4. @43 HOSPITAL_ID 8.;

    format ADMISSION_START_DT ADMISSION_END_DT ddmmyy10.;

    cards;

ABC 24-Nov-15 26-Nov-15 Diabetes     MED  123

ABC 26-Nov-15 3-Dec-15  Surgery      SURG 456

ABC 16-Jan-16 22-Jan-16 Hypertension MED  123

ABC 25-Feb-16 29-Feb-16 Diabetes     MED  123

ABC 29-Feb-16 29-Feb-16 Observation  MED  123

ABC 2-Mar-16  2-Mar-16  Hypertension MED  123

ABC 19-Mar-16 23-Mar-16 Observation  MED  123;

run;

 

* Sort in ascending date order pr patient - just to make sure..;

proc sort data=have; by PATIENT_ID ADMISSION_START_DT;

run;

 

* And here is where the LAG function comes in handy..;

data want (drop=Old_End); set have; by PATIENT_ID;

    Old_End = lag(ADMISSION_END_DT);

        if not first.PATIENT_ID then do;

            if ADMISSION_START_DT = Old_End then delete;

            else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';

        end;

run;

 

  PATIENT_ID ADMISSION_START_DT ADMISSION_END_DT APR_MDC_DESC SERVICE_CAT HOSPITAL_ID READMIT_FLAG_15_DAY
1 ABC 24/11/2015 26/11/2015 Diabetes MED 123  
2 ABC 16/01/2016 22/01/2016 Hypertension MED 123  
3 ABC 25/02/2016 29/02/2016 Diabetes MED 123  
4 ABC 02/03/2016 02/03/2016 Hypertension MED 123 Y
5 ABC 19/03/2016 23/03/2016 Observation MED 123  

 

 

mkeintz
PROC Star

@ErikLund_Jensen

 

Your program deletes the latter record and keeps the earlier record.  I believe the OP wants the opposite.  I don't think lag alone is sufficient.  You need a way to look ahead and delete the prior record when it is known that the subsequent record has the desired data.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErikLund_Jensen
Rhodochrosite | Level 12
You are right. I missed that one.
mkeintz
PROC Star

Why does the 5th record of your desired output have the flag='Y'?  Its start date is 19-mar-16 and the enddate of the preceding record is 2-mar-16, which is more than 15 days separation.  I believe you want the flag='Y' for the 4th record, yes?  (2-feb-16 vs 2-mar-16).  If so then this works:

 


data want (drop=next_stdate temp_:);

  do recnum=1 by 1 until (last.patient_id or next_stdate>admission_end_dt);
    set have (keep=patient_id);
    by patient_id;

    merge have
          have(firstobs=2 keep=admission_start_dt 
          rename=(admission_start_dt=next_stdate));
    if recnum=1 then do;
      temp_date=admission_start_dt;
      temp_cat=service_cat;
    end;
  end;

  if lag(patient_id)=patient_id 
     and admission_start_dt-15 < lag(admission_end_dt)
     then readmit_flag_15_day='Y';

  admission_start_dt=temp_date;
  service_cat=temp_cat;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mikemangini
Obsidian | Level 7

Thank you for assisting with my question. I am still testing but your solution seems to be working great! You are correct on the 'Y' flag being put on the wrong record. As I was simplifying and deidentifying the records to create an example, I let that slip. Thanks again for your assistance!

ErikLund_Jensen
Rhodochrosite | Level 12

@mkeintz is right. It requires look-ahead. But I fear that using a merge inside a loop in the data step would not be very efficient with large data sets. I suggest doing it in a previous step instead:

 

proc sql;

   create table inter as

       select

            a.PATIENT_ID,

            a.ADMISSION_START_DT,

            a.ADMISSION_END_DT,

            coalesce(b.APR_MDC_DESC, a.APR_MDC_DESC) as APR_MDC_DESC,

            a.SERVICE_CAT,

            coalesce(b.HOSPITAL_ID, a.HOSPITAL_ID) as HOSPITAL_ID

   from have as a left join have as b

   on  a.PATIENT_ID = b.PATIENT_ID

       and a.ADMISSION_END_DT = b.ADMISSION_START_DT

       order by a.PATIENT_ID, a.ADMISSION_START_DT;

quit;

 

data want2 (drop=Old_End); set inter; by PATIENT_ID;

   Old_End = lag(ADMISSION_END_DT);

   if not first.PATIENT_ID then do;

       if ADMISSION_START_DT = Old_End then delete;

       else if (ADMISSION_START_DT - Old_End) < 15 then READMIT_FLAG_15_DAY = 'Y';

   end;

run;

mkeintz
PROC Star

@ErikLund_Jensen

 

Why would a self-merge inside a loop be inefficient?  The two data streams (3 if you count the SET statement) are synchronzied, taking advantage of the data set order.  The same disk i/o buffer would be used for all data streams, reducing disk input/output operations.  Once a small group of records is completely processed, they are flushed from memory.  This all makes the operating system very happy.   It's effectively one pass of the data, requires no creation of an intermediate data set, and doesn't use much memory.

 

The alternative you propose has two unsynchronized reads of data set have in the proc sql.  And SQL (apparently) does not know the data set order, so it has to hold all the keys (and data) of one side of the join in memory (or in utility files on disk) in order to have all those keys available for matching against every record in the other side of the join.  It also writes and rereads an intermediate data set on the way to data set WANT.

 

I admit, when data set have is sorted in a useful way, I am NOT a fan of sql in these situations.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErikLund_Jensen
Rhodochrosite | Level 12

@mkeintz

 

I humbly admit you are right. I tried both methods on a large unsorted data set with 1000000 Patient-ID's, and your approach is about 1.8 times faster on an empty machine and almost 4 times faster on a heavy loaded machine. Thanks for the lesson!.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1288 views
  • 2 likes
  • 3 in conversation