BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
panda
Quartz | Level 8

Hi all -

 

I have a data set looks like this, I want to calculate the interval time account for overlapping time for each id for each hour:

id   start time   stop time   interval8   overlap8  interval9 overlap9 interval10 overlap10

1    8:23          8:47            24              24                    

1    8:10          9:30            50                               30        5

1   9:25           9:40                                              15          

2    9:12          10:20                                            48                        20                                                                 

2    10:02        10:14                                                                        12                  18            

2    10:08        10:23                                                                       15          

In this example id1 has overlap from 8am-9am and from 9am-10am. Thus the interval time between 8-9am account for overlapping

time should be (24+50)-24; and the interval time between 9-10am account for overlapping should be (30+15)-5.

id2 has overlap from 10-11am, thus the interval time from 10-11am account for overlapping should be (12+15)-18.          

Right now my code looks like this:

* interval hour = 0 *;
%macro change;
%local n;
%do n = 0 %to 23;
data dpc_&n.;
   set dpc;
   if start_hour = &n.;
%end;
;
run;
%mend change;
%change;

%macro change;
%local n;
%do n = 0 %to 23;
data dpc_overlap&n.;
    set dpc_&n. (rename = (start_time = start stop_time = end));
	by participantid start;
	retain _stime _etime _gap overlaprecords; /* retain temporary variables from previous row */
	if first.participantid then do; /* reset temporary and new variables at each change in participantid */
	   _stime = start;
	   _etime = end;
	   _gap = 0;
	   _cumul_diff = 0;
	   overlaprecords = 0;
	   end;
	_cumul_diff + (end-start); /* calculate cumulative difference between start and end */
	if start > _etime then _gap + (start - _etime); /* calculate gap between start time and previous highest end time */
    if not first.participantid and start < _etime then do; /* count number of overlap records */
	       if overlaprecords = 0 then overlaprecords + 2; /* first overlap records gets count of 2 to account for previous record */
		   else overlaprecords + 1;
		   end;
	if end > _etime then _etime = end; /* update _etime if end is greater than current value */
	if last.participantid  then do; /*caculate overlap time when last record for current participantid is read */
        overlaptime&n. = intck('minute', (_etime - _stime - _gap), _cumul_diff);
		output;
		end;
	drop _: inter0-inter23; /* drop unwanted variables */
%end;
;
run;
%mend change;
%change;

The code can calculate the overlapping time for each start and stop time for each id, but the problem is I cannot split the overlapping

time into each hour (thus for id1 it can tell me the total overlapping time is 29 but can't tell me what proportion is from 8-9am and what proportion is from 9-10am).

 

Does anyone has any idea?

 

Thanks!!!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

I don't understand all the details of the calculations you want, but hopefully this is close enough, and sufficiently simple for you to alter if needed. (note: The first data step just recalculates the intervals and overlaps you already have, so just look at WANT2).

data HAVE;
  input ID START_TIME time.  STOP_TIME : time. ;   
  format   START_TIME time.  STOP_TIME : time. ;   
cards;
1    8:23   8:47 
1    8:10   9:30 
1    9:25   9:40 
2    9:12  10:20 
2    10:02 10:14 
2    10:08 10:23 
run;   
data WANT1; merge HAVE HAVE(firstobs=2 rename=(ID=IDx START_TIME=START_TIMEx STOP_TIME=STOP_TIMEx)) end=LASTOBS ; array INTERVAL[0:23] INTERVAL0-INTERVAL23; array OVERLAP [0:23] OVERLAP0-OVERLAP23; do I= 0 to 23; T1 = (I)*3600; %* start of hour ; T2 = (I+1)*3600; %* end of hour; INTERVAL[I]= ( START_TIME <= T2 & T1 <= STOP_TIME ) * ( min(STOP_TIME,T2) - max(START_TIME,T1) ) / 60; if ID = IDx then OVERLAP[I]= ( START_TIME <= T2 & T1 <= STOP_TIME ) * ( START_TIMEx <= T2 & T1 <= STOP_TIMEx ) * ( min(T2, STOP_TIME, STOP_TIMEx) - max(T1,START_TIME,START_TIMEx) ) / 60; end; run;
data WANT2; merge WANT1 WANT1(firstobs=2 keep =ID INTERVAL: rename=(ID=IDx %macro loop;%local i;%do i=0% to 23; INTERVAL&i=INTERVALx&i%end;%mend;%loop )) end=LASTOBS ; array INTERVAL[ 0:23] INTERVAL0 - INTERVAL23; array INTERVALx[0:23] INTERVALx0- INTERVALx23; array OVERLAP [0:23] OVERLAP0 - OVERLAP23; array OVERPRV [0:23] OVERPRV0 - OVERPRV23; do I= 0 to 23; if ID = IDx then OVERPRV[I]= INTERVAL[I]+INTERVALx[I]-OVERLAP[I]; end; run;
proc print data=WANT2 noobs; var ID START_TIME STOP_TIME
INTERVAL8 OVERLAP8 OVERPRV8 INTERVAL9 OVERLAP9 OVERPRV9 INTERVAL10 OVERLAP10 OVERPRV10 INTERVAL11 OVERLAP11 OVERPRV11; run;
ID START_TIME STOP_TIME INTERVAL8 OVERLAP8 OVERPRV8 INTERVAL9 OVERLAP9 OVERPRV9 INTERVAL10 OVERLAP10 OVERPRV10 INTERVAL11 OVERLAP11 OVERPRV11
1 8:23:00 8:47:00 24 24 50 0 0 30 0 0 0 0 0 0
1 8:10:00 9:30:00 50 0 50 30 5 40 0 0 0 0 0 0
1 9:25:00 9:40:00 0     15     0     0    
2 9:12:00 10:20:00 0 0 0 48 0 48 20 12 20 0 0 0
2 10:02:00 10:14:00 0 0 0 0 0 0 12 6 21 0 0 0
2 10:08:00 10:23:00 0     0     15     0    

 

View solution in original post

5 REPLIES 5
art297
Opal | Level 21

Just one question:

 

Do you only have to account for times, and do they never go over 23:59 (i.e., does your data always represent just one day)?

 

Art, CEO, AnalystFinder.com

 

panda
Quartz | Level 8

Hi - 

 

They do cross over 23:00pm to 0:00am or 2:00am next day morning. 

 

Thanks!

art297
Opal | Level 21

Glad you responded as I found the problem interesting and have been working on code to solve it. The concept of overlap makes it quite challenging and I've got a number of questions for you.

 

I haven't been trying to fix your macro but, rather, do everything in a single datastep.

 

To do that, I start by sorting the records by time within ID. The info you just shared, however, puts a twist on that .. unless your actual data has datetimes rather than just times.

 

If your data really does only have times, rather than datetime, we have to know the range of times (i.e., what are the earliest and latest times that might ever be in your data?), 

 

Please describe more about overlap as, I think, your example wasn't consistent. Should it be applied to the earlier or later record? And, if there is a double overlap, should it count for double the minutes? And, should the overlap be calculated across all records for a given ID? Such a case was present in your example ID number 2, but I was unable to justify the overlaps you showed.

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

You didn't get back to me so I proceded with the info I had. I think the following either does what you want or comes close enough that you can adjust it where/if necessary:

data want (keep=id start_time stop_time inter: over:);
  set need;
  by id;
  format current_start_dt current_stop_dt datetime21.;
  format start_time stop_time time5.;
  set need ( firstobs = 2 keep = start_dt stop_dt
             rename = (start_dt = next_start_dt stop_dt=next_stop_dt) )
      need (obs = 1 drop = _all_);
  array interval(6:26) interval6-interval26;
  array overlap(6:26) overlap6-overlap26;
  start_hour=ifn(4 le hour(timepart(start_dt)),hour(timepart(start_dt)),hour(timepart(start_dt))+23);
  start_minutes=minute(timepart(start_dt));
  stop_hour=ifn(4 le hour(timepart(stop_dt)),hour(timepart(stop_dt)),hour(timepart(stop_dt))+23);
  stop_minutes=minute(timepart(stop_dt));
  if start_hour eq stop_hour then interval(start_hour)=(stop_dt-start_dt)/60;
  else do i=start_hour to stop_hour;
    if i eq start_hour then interval(i)=60-start_minutes;
    else if i eq stop_hour then interval(i)=stop_minutes;
    else interval(i)=60;
  end;
  start_time=timepart(start_dt);
  stop_time=timepart(stop_dt);
  last_stop_dt=lag(stop_dt);
  last_stop_hour=lag(stop_hour);
  last_stop_minutes=lag(stop_minutes);
  if not last.id then do;
    if not first.id then do;
      stop_dt=max(stop_dt,last_stop_dt);
      stop_hour=max(stop_hour,last_stop_hour);
      stop_minutes=max(stop_minutes,last_stop_minutes);
    end;
    next_start_dt = ifn(  last.ID, (.), next_start_dt );
    next_stop_dt = ifn(  last.ID, (.), next_stop_dt );
	if next_start_dt le stop_dt then do;
      next_start_hour=ifn(4 le hour(timepart(next_start_dt)),hour(timepart(next_start_dt)),hour(timepart(next_start_dt))+23);
      next_start_minutes=minute(timepart(next_start_dt));
      next_stop_hour=ifn(4 le hour(timepart(next_stop_dt)),hour(timepart(next_stop_dt)),hour(timepart(next_stop_dt))+23);
      next_stop_minutes=minute(timepart(next_stop_dt));
      do i=next_start_hour to min(next_stop_hour,stop_hour);
        if i eq min(next_stop_hour,stop_hour) then do;
          current_start_dt=next_start_dt;
          if stop_hour gt i then current_stop_dt=min(next_stop_dt,intnx('hour',stop_dt,0,'e'));
          else current_stop_dt=min(stop_dt,next_stop_dt);
          overlap(i)=(current_stop_dt-current_start_dt)/60;
        end;
        else if i ne min(next_stop_hour,stop_hour) then do;
          overlap(i)=(intnx('hour',next_start_dt,0,'e')-next_start_dt)/60;
        end;
        else overlap(i)=60;
      end;
    end;
  end;
run;

data want;
  retain id start_time stop_time
         interval6 overlap6
         interval7 overlap7
         interval8 overlap8
         interval9 overlap9
         interval10 overlap10
         interval11 overlap11
         interval12 overlap12
         interval13 overlap13
         interval14 overlap14
         interval15 overlap15
         interval16 overlap16
         interval17 overlap17
         interval18 overlap18
         interval19 overlap19
         interval20 overlap20
         interval21 overlap21
         interval22 overlap22
         interval23 overlap23
         interval24 overlap24
         interval25 overlap25
         interval26 overlap26;
  set want;
run;

Art, CEO, AnalystFinder.com

 

ChrisNZ
Tourmaline | Level 20

I don't understand all the details of the calculations you want, but hopefully this is close enough, and sufficiently simple for you to alter if needed. (note: The first data step just recalculates the intervals and overlaps you already have, so just look at WANT2).

data HAVE;
  input ID START_TIME time.  STOP_TIME : time. ;   
  format   START_TIME time.  STOP_TIME : time. ;   
cards;
1    8:23   8:47 
1    8:10   9:30 
1    9:25   9:40 
2    9:12  10:20 
2    10:02 10:14 
2    10:08 10:23 
run;   
data WANT1; merge HAVE HAVE(firstobs=2 rename=(ID=IDx START_TIME=START_TIMEx STOP_TIME=STOP_TIMEx)) end=LASTOBS ; array INTERVAL[0:23] INTERVAL0-INTERVAL23; array OVERLAP [0:23] OVERLAP0-OVERLAP23; do I= 0 to 23; T1 = (I)*3600; %* start of hour ; T2 = (I+1)*3600; %* end of hour; INTERVAL[I]= ( START_TIME <= T2 & T1 <= STOP_TIME ) * ( min(STOP_TIME,T2) - max(START_TIME,T1) ) / 60; if ID = IDx then OVERLAP[I]= ( START_TIME <= T2 & T1 <= STOP_TIME ) * ( START_TIMEx <= T2 & T1 <= STOP_TIMEx ) * ( min(T2, STOP_TIME, STOP_TIMEx) - max(T1,START_TIME,START_TIMEx) ) / 60; end; run;
data WANT2; merge WANT1 WANT1(firstobs=2 keep =ID INTERVAL: rename=(ID=IDx %macro loop;%local i;%do i=0% to 23; INTERVAL&i=INTERVALx&i%end;%mend;%loop )) end=LASTOBS ; array INTERVAL[ 0:23] INTERVAL0 - INTERVAL23; array INTERVALx[0:23] INTERVALx0- INTERVALx23; array OVERLAP [0:23] OVERLAP0 - OVERLAP23; array OVERPRV [0:23] OVERPRV0 - OVERPRV23; do I= 0 to 23; if ID = IDx then OVERPRV[I]= INTERVAL[I]+INTERVALx[I]-OVERLAP[I]; end; run;
proc print data=WANT2 noobs; var ID START_TIME STOP_TIME
INTERVAL8 OVERLAP8 OVERPRV8 INTERVAL9 OVERLAP9 OVERPRV9 INTERVAL10 OVERLAP10 OVERPRV10 INTERVAL11 OVERLAP11 OVERPRV11; run;
ID START_TIME STOP_TIME INTERVAL8 OVERLAP8 OVERPRV8 INTERVAL9 OVERLAP9 OVERPRV9 INTERVAL10 OVERLAP10 OVERPRV10 INTERVAL11 OVERLAP11 OVERPRV11
1 8:23:00 8:47:00 24 24 50 0 0 30 0 0 0 0 0 0
1 8:10:00 9:30:00 50 0 50 30 5 40 0 0 0 0 0 0
1 9:25:00 9:40:00 0     15     0     0    
2 9:12:00 10:20:00 0 0 0 48 0 48 20 12 20 0 0 0
2 10:02:00 10:14:00 0 0 0 0 0 0 12 6 21 0 0 0
2 10:08:00 10:23:00 0     0     15     0    

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1033 views
  • 2 likes
  • 3 in conversation