combining overlapping records

ChristyN · Posted 06-16-2020 04:20 PM

*Update: I am dealing with bad data and trying to clean it up. The have data represents scenarios where there were multiple admits/releases within an hour. It is assumed these were errors (e.g. released the wrong person so then readmitted them (red) or used the wrong release reason so readmitted and then re-released them with the correct reason (purple).

Hello, I am using the following program to combine records that overlap allowing 1 hour between them. I am not getting the expected results. I suspect it has to do with the datetime format. Is 3600 not the number I should be using to represent 1 hour?

data a4;
do until (last.id);
set a3;
by id;
retain _start _end;
format _start _end datetime20.;
if first.id then
do;
_start=move_datetime;
_end=next_datetime;
end;
if move_datetime > _end+3600 then
do; *cases with only one hour (3600 seconds);
output;
_start=move_datetime;
_end=next_datetime;
end;
else _end=max(_end, next_datetime);
end;
output;
*drop move_datetime next_move_datetime;
rename _start=admit_date _end=release_date;
run;

Have1:
ID	Move	move_datetime
1	Admit	01JAN2019:12:00:00
1	Release	01JAN2019:12:01:00
1	Admit	15JAN2019:12:00:00
1	Release	17JAN2019:21:56:00
1	Admit	17JAN2019:21:57:00
1	Release	20JAN2019:08:05:00
1	Admit	01FEB2019:01:20:00
1	Release	15FEB2019:21:50:00
1	Admit	15FEB2019:21:50:00
1	Release	15FEB2019:21:50:00

I created the next_datetime variable by lagging the move_datetime.
Have2:
ID	Move	move_datetime	next_datetime
1	Admit	01JAN2019:12:00:00	01JAN2019:12:01:00
1	Release	01JAN2019:12:01:00	15JAN2019:12:00:00
1	Admit	15JAN2019:12:00:00	17JAN2019:21:56:00
1	Release	17JAN2019:21:56:00	17JAN2019:21:57:00
1	Admit	17JAN2019:21:57:00	20JAN2019:08:05:00
1	Release	20JAN2019:08:05:00	01FEB2019:01:20:00
1	Admit	01FEB2019:01:20:00	15FEB2019:21:50:00
1	Release	15FEB2019:21:50:00	15FEB2019:21:50:00
1	Admit	15FEB2019:21:50:00	15FEB2019:21:50:00
1	Release	15FEB2019:21:50:00	.
*There is no next move for the last record so I replace the missing with the dummy 01JAN3000:00:00:00

Want:
ID	admit_date	release_date
1	01JAN2019:12:00:00	01JAN2019:12:01:00
1	15JAN2019:12:00:00	20JAN2019:08:05:00
1	01FEB2019:01:20:00	15FEB2019:21:50:00

Thanks in advance!

ballardw · Posted 06-16-2020 04:56 PM

Please describe what you want to do with the datetime values.

While 3600 seconds does represent one hour you may be wanting to look at the INTNX function to provide actual interval boundaries.

result = intnx('hour',_start,1,'B')

would yield a result that is the beginning of the next hour on a clock. Use an appropriate format for result to see the results. Also the companion function INTCK returns the number of intervals of a given type occur between two date, time or datetime values.

It would likely be a good idea to provide a few rows of starting data for the variables you need to use and what the result should look like.

ChristyN · Posted 06-16-2020 05:57 PM

Thank you I added sample data.

smantha · Posted 06-16-2020 05:09 PM

Sample data explaining what needs to happen and what is not happening would be good.

ChristyN · Posted 06-16-2020 05:57 PM

Thank you I added data.

ballardw · Posted 06-16-2020 06:09 PM

@ChristyN wrote:
Thank you I added data.

Your "want" data does not appear to show anything concerning either exactly one hour or hour-by-clock values. So you need to provide some clearer description about what your concern with one hour actually is and show the results in the "want" data.

Do you actually have two release times the same with an admit the same?

Here is one way to provide data step of example data:

data have;
   input ID $	Move $ 	move_datetime 	:datetime18.;
   format move_datetime datetime20.;
datalines;
1 	Admit 	01JAN2019:12:00:00 	 
1 	Release 	01JAN2019:12:01:00 	 
1 	Admit 	15JAN2019:12:00:00 	 
1 	Release 	17JAN2019:21:56:00 	 
1 	Admit 	17JAN2019:21:57:00 	 
1 	Release 	20JAN2019:08:05:00 	 
1 	Admit 	01FEB2019:01:20:00 	 
1 	Release 	15FEB2019:21:50:00 	 
1 	Admit 	15FEB2019:21:50:00 	 
1 	Release 	15FEB2019:21:50:00 	 
;

I'm not going to clean up tabs left from pasting "tables" of what ever sort.

Patrick · Posted 06-16-2020 10:24 PM

Below code creates your table Have2. It also demonstrates how you can look-ahead.

I didn't fully understand the logic which leads to the Want dataset but you could extend the look-ahead logic below to pick the row with the release date you're after.

data have;
  infile datalines truncover dsd;
  input ID Move:$20. move_datetime:datetime20.;
  format move_datetime datetime20.;
datalines;
1,Admit,01JAN2019:12:00:00
1,Release,01JAN2019:12:01:00
1,Admit,15JAN2019:12:00:00
1,Release,17JAN2019:21:56:00
1,Admit,17JAN2019:21:57:00
1,Release,20JAN2019:08:05:00
1,Admit,01FEB2019:01:20:00
1,Release,15FEB2019:21:50:00
1,Admit,15FEB2019:21:50:00
1,Release,15FEB2019:21:50:00
;

data have2;
  set have;
  by id move_datetime;
  if not last.id then
    do;
      n_obs=_n_+1;
      set have
        (keep=move_datetime rename=(move_datetime=next_datetime))
        point=n_obs;
    end;
  if last.id then call missing(next_datetime);
run;
  
proc print;
run;

smantha · Posted 06-17-2020 10:27 AM

Is this what you are looking for

DATA WORK.test;
    LENGTH
        ID               $ 14
        Move               $ 12 
         Move_datetime   6.;
    FORMAT
        ID               $CHAR14.
        Move               $CHAR12. 
         Move_datetime datetime18.;
    INFORMAT
        ID               $CHAR14.
        Move               $CHAR12. 
		Move_datetime datetime18.;
    INPUT
        ID               : $CHAR14.
        Move               : $CHAR12. 
        Move_datetime :datetime18.;
DATALINES4;
1	Admit	01JAN2019:12:00:00	 
1	Release	01JAN2019:12:01:00	 
1	Admit	15JAN2019:12:00:00	 
1	Release	17JAN2019:21:56:00	 
1	Admit	17JAN2019:21:57:00	 
1	Release	20JAN2019:08:05:00	 
1	Admit	01FEB2019:01:20:00	 
1	Release	15FEB2019:21:50:00	 
1	Admit	15FEB2019:21:50:00	 
1	Release	15FEB2019:21:50:00	 
;;;;
run;
proc sort; by ID Move_datetime; run;
data want;
set test nobs=obs;
call symputx ('nobs',obs);
stop;
run;

data want(keep= ID Admit_Date Release_Date);
set test;
by ID Move_datetime;
array Admit_date_array{&nobs} ;
array release_date_array{&nobs};
retain release_date_array: Admit_date_array: count 0;
if first.id then count=1;
if strip(Move)='Admit' then Admit_date_array[count] = Move_datetime;
if strip(Move)='Release' then do;
  Release_date_array[count] = Move_datetime;
  count=count+1;
end;
format Admit_date Release_date datetime18.;
if last.id then do;
     Admit_date = Admit_date_array[1];
	 Release_date = Release_date_array[1];
     do val = 1 to count-1;
	    
		if Admit_date_array[min(count,val+1)] - Release_date > 3600 then do;
		    Release_date = Release_date_array[Val];
			output;
			Admit_date = Admit_date_array[min(count-1,val+1)];
			Release_date = Release_date_array[min(count-1,val)];
			val=val+1;
			put '*********************************************';
		    put _all_;
			put '##############################################';
		end;
		else if val>=count-1 then do;
		   Release_date = Release_date_array[min(count-1,val)];
		   output;
		end;
		
	end;
end;
run;

combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Re: combining overlapping records

Classroom Training Available!