Re: Detecting overlapping times

RVA · Posted 12-29-2019 08:43 PM

Dear all

I have the following data. What do I need to do to flag rows 1 and 2 as they indeed happen at the same days and time (overlapping)? I have been thinking but still cannot come up with SAS scripts to flag those two rows. Any help is much appreciated.

Course section	Meeting_days	Beginning_time	Ending_time
CHEM 100 001	MWF	13:00	13:45
CHEM 100 007	MWF	13:15	13:45
CHEM 100 010	MWF	14:00	16:00
CHEM 100 021	MWF	9:00	10:00

novinosrin · Posted 12-29-2019 09:38 PM

Hi @RVA Thank you for posting the input sample. What is the corresponding output sample for the input sample?

blueroad · Posted 12-29-2019 09:41 PM

brute force method - here is your algorithm (no SAS code)

make a data structure for every minute for the days of interest

mo0001-mo2359

tu0000-tu2359

if your data is limited to say 800 to 1800 your minutes array can be smaller

set all values to zero

take each row - use the start and end time to add 100 to each minute value

once you have processed all your rows, any values of your minute values GT 100 have overlaps

how many rows do you expect ? put an integer on each row and add that to your minute flags

if I get a free hour or so I may be able to come up with some code

I recommend not using "scripts" to describe your work in SAS - use programs or code.

IMHO it "cheapens" your efforts.

Bob B, Somerset NJ

KachiM · Posted 12-30-2019 02:09 AM

Hi @RVA ,

It looks that the comparison has to be made with all records to decide the match. Here is a way where the records are matched and if found to have overlapping records then their RID (Record IDs in terms of _N_) is output. I have modified your input data with some more overlapping records to verify my code. Here is the augmented data set and the program. The approach is to note the Beginning, Ending times and the RID for the first record. The subsequent records are then compared with the previous values and if there is an overlap then the corresponding RIDs are output. If there is no overlap, the current record will be stored as previous values, prevRid taking the current _N_.

CHEM100  001 MWF 13:00 13:45
CHEM100  007 MWF 13:15 13:45
CHEM100  007 MWF 13:35 13:40
CHEM100  010 MWF 14:00 16:00
CHEM100  010 MWF 15:00 15:30
CHEM100  021 MWF  9:00 10:00
CHEM100  025 MWF  9:30 10:00


data want;
   do i = 1 by 1 until(eof);
      set have end = eof;
      if prevB = . then do; 
         prevB = Beginning_time;
         prevE = Ending_time;
         prevRid = i; 
      end;
      else do;
         if prevB <= Beginning_time and prevE >= Ending_time then do;
            nextRid = i;
            output;
         end;
         else do;
            prevB = Beginning_time;
            prevE = Ending_time;
            prevRid = i;
         end;
      end;
   end;
run;

The output is:

KachiM · Posted 12-30-2019 03:21 AM

Hi @RVA,

My previous program assumes that the overlapping records are placed one-after the other. What if the overlapping records are randomly placed?

If the overlapping record happens to fall anywhere (not to next) then my previous code can be revised using POINT= option of SET statement. The algorithm is to to select the i-th record and compare it with all records starting from i+1, until the end of the data set. If there is a overlap, then output the Record IDs of both the records.

I have reordered some of the overlapping records as shown below:

CHEM100  001 MWF 13:00 13:45
CHEM100  007 MWF 13:35 13:40
CHEM100  010 MWF 14:00 16:00
CHEM100  021 MWF  9:00 10:00
CHEM100  025 MWF  9:30 10:00
CHEM100  007 MWF 13:15 13:45
CHEM100  010 MWF 15:00 15:30

The revised program:

data want;
   do i = 1 to n - 1;
      set have nobs = n point = i; 
         prevB = Beginning_time;
         prevE = Ending_time;
         prevRid = i; 

      do j = i + 1 to n;
         set have point = j;
         if prevB <= Beginning_time and prevE >= Ending_time then do;
            nextRid = j;
            output;
         end;
      end;
   end;
stop;
run;

The OUPUT Is:

ed_sas_member · Posted 12-30-2019 08:34 AM

Hi @RVA

Here is my attempt. Let me know!

data have;
	input Course_section $ 1-12 Meeting_days $ 14-16 Beginning_time Ending_time;
	informat Beginning_time Ending_time time10.;
	format Beginning_time Ending_time time10.;
	datalines;
CHEM 100 001 MWF 13:00 13:45
CHEM 100 007 MWF 13:15 13:45
CHEM 100 010 MWF 14:00 16:00
CHEM 100 021 MWF 9:00  10:00
;
run;

proc sort data=have;
	by Meeting_days Beginning_time;
run;

data have2;
	set have;
	by Meeting_days;
	format _lag time10.;
	_lag = lag(Ending_time);
	if first.Meeting_days then _lag=.;
	if _lag >= Beginning_time then flag = 1;
	else flag=0;
	drop _lag;
run;

data want;
	merge have2 have2 (firstobs=2 keep=flag rename=(flag=flag_lead));
	if flag=0 and flag_lead=1 then flag=1;
	drop flag_lead;
run;

Output:

RVA · Posted 01-07-2020 08:17 AM

Dear all

Thank you so much for your help. Let me plug your scripts in and run them on my data. I'll get back if I have any questions.

Much appreciated.

Detecting overlapping times