Solved: Re: How do I collapse consecutive dates?

amandav2107 · Posted 09-13-2023 04:22 PM

Hi all,

I have a dataset with multiple dates of admission per person (client_id). In the example below of one client in my dataset, lines 1, 2, & 3 overlap with consecutive dates. Additionally, lines 5 &6 overlap. I'm looking to limit the dataset further to collapse the consecutive dates.

Obs	ID	admit	client_id	release	admitcount
1	295815	13-Dec-22	4087673300	7-Jan-23	1
2	295936	7-Jan-23	4087673300	9-Jan-23	2
3	295950	9-Jan-23	4087673300	18-Jan-23	3
4	296219	21-Feb-23	4087673300	22-Feb-23	4
5	299130	26-Jun-23	4087673300	26-Jun-23	5
6	299132	26-Jun-23	4087673300	6-Jul-23	6

I tried using a do loop to sort through the consecutive dates and only keep the records that have actual admit dates.

proc sort data=morethanone; by client_id ID; run;

data want;
  set morethanone;
  by client_id ID;
  lag_enddate=lag(release);
  if not first.client_id then do;
    if admit-lag_enddate=0 then delete_flag=1;
  end;
  if delete_flag then delete;
  format lag_enddate date7.;
run;

ID	admit	client_id	release	admitcount	lag_enddate	delete_flag
295815	13-Dec-22	4087673300	7-Jan-23	1	13-Feb-23	.
296219	21-Feb-23	4087673300	22-Feb-23	4	18-Jan-23	.
299130	26-Jun-23	4087673300	26-Jun-23	5	22-Feb-23	.

The program kept the records I wanted but I would like the lag_enddate to be the actual end date.

What I want:

ID	admit	client_id	release	admitcount	lag_enddate	delete_flag
295815	13-Dec-22	4087673300	7-Jan-23	1	18-Jan-23	.
296219	21-Feb-23	4087673300	22-Feb-23	4	22-Feb-23	.
299130	26-Jun-23	4087673300	26-Jun-23	5	6-Jul-23	.

Please help! Tell me what I'm doing wrong.

Tom · Posted 09-19-2023 03:28 PM

Yes. Just make sure it is sorted.

proc sort data=have;
  by client_id admit release;
run;

Result

Obs    client_id     admitcount        start          end    n_admit    days    gap

 1     4087673300         3        13DEC2022    18JAN2023       3        37      34
 2     4087673300         4        21FEB2023    22FEB2023       1         2     124
 3     4087673300         6        26JUN2023    06JUL2023       2        11       .
 4     5265189100         3        13DEC2022    18JAN2023       3        37      34
 5     5265189100         4        21FEB2023    22FEB2023       1         2     124
 6     5265189100         6        26JUN2023    06JUL2023       2        11       .
 7     8646854600         3        13DEC2022    18JAN2023       3        37      34
 8     8646854600         4        21FEB2023    22FEB2023       1         2     124
 9     8646854600         6        26JUN2023    06JUL2023       2        11       .

View solution in original post

mkeintz · Posted 09-13-2023 04:54 PM

You don't need a DO group here. Judicious use of lag values and first.client_id inside an IFN function is is all you need to see the information you need.

data morethanone;
  input
  ID       admit :date9.	client_id	release :date9.	admitcount;
  format admit release date9. ;
datalines;
295815	13-Dec-22	4087673300	7-Jan-23	1
295936	7-Jan-23	4087673300	9-Jan-23	2
295950	9-Jan-23	4087673300	18-Jan-23	3
296219	21-Feb-23	4087673300	22-Feb-23	4
299130	26-Jun-23	4087673300	26-Jun-23	5
299132	26-Jun-23	4087673300	6-Jul-23	6
run;

data want ;
  set morethanone;
  by client_id id ;
  lag_enddate=ifn(first.client_id,admit,lag(release));
  if first.client_id=1 or admit-1 > lag_enddate;
  format lag_enddate date9.;
run;

I presume for the first record for a given client_id, you want ADMIT as the lag_enddate value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

amandav2107 · Posted 09-18-2023 08:45 AM

@mkeintz thanks for your help! Not quite what I was looking for. This particular client has been admitted to the facility for a true total of 3 times even though the data shows it as 6 admits. There are instances where he was released and admitted the same day, but it should not be treated as two separte admits, rather one string of days. Ultimately, I want to collapse/limit the data so I get those three date ranges below and do anaverage length of stay.

First admit: 13DEC22 - 18JAN23

Second admit: 21FEB23 - 22FEB23

Third admit: 26JUN23 - 06JUL23

If I did the average length of stay with the orginal 6 admits and date ranges, it wouldn't be accurate because I would have to divide the number of admits by 6 rather than 3.

amandav2107 · Posted 09-18-2023 08:47 AM

@mkeintz Also, with the steps you provided, it did choose the right admit dates but the wrong lag_enddate.

mkeintz · Posted 09-18-2023 01:28 PM

@amandav2107 wrote:

@mkeintz Also, with the steps you provided, it did choose the right admit dates but the wrong lag_enddate.

I'd say naming a variable lag_enddate implies looking back, which is actually the opposite of what I now see you want. You want the last_enddate (which I use in the code below) in a group of records, in which all the other data come from the first record of a sequence - i.e. you want to look forward.

The also means every obs in the dataset has to be read twice, because you have to look forward to get last_enddate. Then in the second read, keep that last_enddate and output it with all the data from the first record.

To know when to begin the re-reading, you have to know when you've reached the end of a sequence (i.e. when the next record is from a new client_id or has an admit date more than one day after the release date in hand. That's what the merge statement below supports, in combination with the firstobs=2 dataset name parameter. Note the merge statement does NOT have the usual accompanying BY statement.

data morethanone;
  input
  ID       admit :date9.	client_id	release :date9.	admitcount;
  format admit release date9. ;
datalines;
295815	13-Dec-22	4087673300	7-Jan-23	1
295936	7-Jan-23	4087673300	9-Jan-23	2
295950	9-Jan-23	4087673300	18-Jan-23	3
296219	21-Feb-23	4087673300	22-Feb-23	4
299130	26-Jun-23	4087673300	26-Jun-23	5
299132	26-Jun-23	4087673300	6-Jul-23	6
run;

data want (drop=i j nxt_:);
  do i=1 by 1 until (nxt_cid^=client_id or nxt_adm>release+1);
    merge morethanone 
          morethanone (firstobs=2 keep=client_id admit rename=(client_id=nxt_cid admit=nxt_adm));
    last_enddate=release;
  end;
  attrib last_enddate label='Final RELEASE date in this sequence' format=date9.;
  do j=1 to i;   /*Reread the sequence */
    set morethanone;
    if j=1 then output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

amandav2107 · Posted 09-19-2023 02:01 PM

@mkeintz Okay, I've never used this logic before. That was really interesting to work through. What was your output? Because I ended up with one record with a last_enddate of 07JAN2023.

ID	admit	client_id	release	admitcount	last_enddate
299132	26-Jun-23	4087673300	6-Jul-23	6	07-Jan-23

That was this client's last admit and and last release date. I am missing two more true admits.

Tom · Posted 09-18-2023 07:14 PM

It is probably easier to think about this as a LOOK AHEAD problem instead of a LOOK BACK problem.

So instead of the LAG of RELEASE you need to find the LEAD of ADMIT.

Here is one method using a second SET statement with dataset options to just pull in the next value of ADMIT and call it NEXT_ADMIT.

Now you just need to RETAIN the new starting date to do the comlapsing.

While we are at is let's make the minimum gap between release and the next admit that we want to collapse into a macro variable to make it easier.

First let's convert your listing into actual data.

data have;
  input ID $ admit :date. client_id :$10. release :date. admitcount;
  format admit release date9.;
cards;
295815 13-Dec-22 4087673300  7-Jan-23 1
295936  7-Jan-23 4087673300  9-Jan-23 2
295950  9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300  6-Jul-23 6
;

Now we can process it by CLIENT_ID in the order of ADMIT.

%let gap=0;
data want;
  set have ;
  by client_id admit ;
  set have(firstobs=2 keep=admit rename=(admit=next_admit)) have(obs=1 drop=_all_);
  length start end n_admit days gap 8;
  retain start;
  if first.client_id then do;
    start=admit;
    n_admit=0;
  end;
  if last.client_id then next_admit = release + &gap +1;
  n_admit +1;
  gap =next_admit-release ;
  if gap > &gap then do;
    end = release;
    days = release - start +1;
    if last.client_id then gap=.;
    output;
    start=admit;
    n_admit=0;
  end; 
  format start end date9.;
  drop id admit release next_admit;
run;

Result

Obs    client_id     admitcount        start          end    n_admit    days    gap

 1     4087673300         3        13DEC2022    18JAN2023       3         37     34
 2     4087673300         4        09JAN2023    22FEB2023       1         45    124
 3     4087673300         6        21FEB2023    06JUL2023       2        136      .

amandav2107 · Posted 09-19-2023 02:29 PM

Hey @Tom ! Interesting logic, however with the start and end dates are wrong. With the results you ended up with, my days will be off. The program choose the right 1st start and 1st end but should have been n_admit 1. For the 2nd start/n_admit it chose the wrong end. It should have been 22FEB2023. As for n_admit 1, the start should have been 26JUN2023 and the end should have been 06JUL2023.

n_admit 2 and 1 overlap.

Obs    client_id     admitcount        start          end    n_admit    days    gap

 1     4087673300         3        13DEC2022    18JAN2023       3         37     34
 2     4087673300         4        09JAN2023    22FEB2023       1         45    124
 3     4087673300         6        21FEB2023    06JUL2023       2        136      .

Here are the dates I want to ultimately keep to do the count of days inbetween. With this result, I'd have an accurate count.

First admit: 13DEC2022 - 18JAN2023

Second admit: 21FEB2023 - 22FEB2023

Third admit: 26JUN2023 - 06JUL2023

Tom · Posted 09-19-2023 02:40 PM

Set the START to missing after detecting a gap.

Test for missing(START) to set a new START.

data have;
  input ID $ admit :date. client_id :$10. release :date. admitcount;
  format admit release date9.;
cards;
295815 13-Dec-22 4087673300  7-Jan-23 1
295936  7-Jan-23 4087673300  9-Jan-23 2
295950  9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300  6-Jul-23 6
296219 21-Feb-23 999        22-Feb-23 1
299130 26-Jun-23 999        26-Jun-23 2
299132 26-Jun-23 999         6-Jul-23 3
;

%let gap=0;
data want;
  set have ;
  by client_id admit ;
* Add NEXT_ADMIT variable ;
  set have(firstobs=2 keep=admit rename=(admit=next_admit)) have(obs=1 drop=_all_);
  if last.client_id then next_admit = release + &gap +1;

  length start end n_admit days gap 8;
  start=coalesce(start,admit);
  n_admit +1;
  retain start;
  gap=next_admit-release ;
  if gap > &gap then do;
    end = release;
    days = release - start +1;
    if last.client_id then gap=.;
    output;
    start=.;
    n_admit=0;
  end; 
  format start end date9.;
  drop id admit release next_admit;
run;

Obs    client_id     admitcount        start          end    n_admit    days    gap

 1     4087673300         3        13DEC2022    18JAN2023       3        37      34
 2     4087673300         4        21FEB2023    22FEB2023       1         2     124
 3     4087673300         6        26JUN2023    06JUL2023       2        11       .
 4     999                1        21FEB2023    22FEB2023       1         2     124
 5     999                3        26JUN2023    06JUL2023       2        11       .

amandav2107 · Posted 09-19-2023 03:08 PM

That worked really well! Thank you for that. Can I utilize this same program with a dataset that has more than one client_id (hundreds)?

Example below only has 3 cleints (but 2 more than the first example): 4087673300, 5265189100, & 8646854600.

data have;
  input ID $ admit :date. client_id :$10. release :date. admitcount;
  format admit release date9.;
cards;
295815 13-Dec-22 8646854600  7-Jan-23 1
295936  7-Jan-23 8646854600  9-Jan-23 2
295950  9-Jan-23 8646854600 18-Jan-23 3
296219 21-Feb-23 8646854600 22-Feb-23 4
299130 26-Jun-23 8646854600 26-Jun-23 5
299132 26-Jun-23 8646854600  6-Jul-23 6
295815 13-Dec-22 5265189100  7-Jan-23 1
295936  7-Jan-23 5265189100  9-Jan-23 2
295950  9-Jan-23 5265189100 18-Jan-23 3
296219 21-Feb-23 5265189100 22-Feb-23 4
299130 26-Jun-23 5265189100 26-Jun-23 5
299132 26-Jun-23 5265189100  6-Jul-23 6
295815 13-Dec-22 4087673300  7-Jan-23 1
295936  7-Jan-23 4087673300  9-Jan-23 2
295950  9-Jan-23 4087673300 18-Jan-23 3
296219 21-Feb-23 4087673300 22-Feb-23 4
299130 26-Jun-23 4087673300 26-Jun-23 5
299132 26-Jun-23 4087673300  6-Jul-23 6
;

Tom · Posted 09-19-2023 03:28 PM

Yes. Just make sure it is sorted.

proc sort data=have;
  by client_id admit release;
run;

Result

Obs    client_id     admitcount        start          end    n_admit    days    gap

 1     4087673300         3        13DEC2022    18JAN2023       3        37      34
 2     4087673300         4        21FEB2023    22FEB2023       1         2     124
 3     4087673300         6        26JUN2023    06JUL2023       2        11       .
 4     5265189100         3        13DEC2022    18JAN2023       3        37      34
 5     5265189100         4        21FEB2023    22FEB2023       1         2     124
 6     5265189100         6        26JUN2023    06JUL2023       2        11       .
 7     8646854600         3        13DEC2022    18JAN2023       3        37      34
 8     8646854600         4        21FEB2023    22FEB2023       1         2     124
 9     8646854600         6        26JUN2023    06JUL2023       2        11       .

amandav2107 · Posted 09-19-2023 03:30 PM

Thank you, Tom! I really appreciate the help.

SAS Innovate 2025: Call for Content

Classroom Training Available!