Dear all
I have the following data. What do I need to do to flag rows 1 and 2 as they indeed happen at the same days and time (overlapping)? I have been thinking but still cannot come up with SAS scripts to flag those two rows. Any help is much appreciated.
Course section | Meeting_days | Beginning_time | Ending_time |
CHEM 100 001 | MWF | 13:00 | 13:45 |
CHEM 100 007 | MWF | 13:15 | 13:45 |
CHEM 100 010 | MWF | 14:00 | 16:00 |
CHEM 100 021 | MWF | 9:00 | 10:00 |
Hi @RVA Thank you for posting the input sample. What is the corresponding output sample for the input sample?
brute force method - here is your algorithm (no SAS code)
make a data structure for every minute for the days of interest
mo0001-mo2359
tu0000-tu2359
if your data is limited to say 800 to 1800 your minutes array can be smaller
set all values to zero
take each row - use the start and end time to add 100 to each minute value
once you have processed all your rows, any values of your minute values GT 100 have overlaps
how many rows do you expect ? put an integer on each row and add that to your minute flags
if I get a free hour or so I may be able to come up with some code
I recommend not using "scripts" to describe your work in SAS - use programs or code.
IMHO it "cheapens" your efforts.
Bob B, Somerset NJ
Hi @RVA ,
It looks that the comparison has to be made with all records to decide the match. Here is a way where the records are matched and if found to have overlapping records then their RID (Record IDs in terms of _N_) is output. I have modified your input data with some more overlapping records to verify my code. Here is the augmented data set and the program. The approach is to note the Beginning, Ending times and the RID for the first record. The subsequent records are then compared with the previous values and if there is an overlap then the corresponding RIDs are output. If there is no overlap, the current record will be stored as previous values, prevRid taking the current _N_.
CHEM100 001 MWF 13:00 13:45
CHEM100 007 MWF 13:15 13:45
CHEM100 007 MWF 13:35 13:40
CHEM100 010 MWF 14:00 16:00
CHEM100 010 MWF 15:00 15:30
CHEM100 021 MWF 9:00 10:00
CHEM100 025 MWF 9:30 10:00
data want;
do i = 1 by 1 until(eof);
set have end = eof;
if prevB = . then do;
prevB = Beginning_time;
prevE = Ending_time;
prevRid = i;
end;
else do;
if prevB <= Beginning_time and prevE >= Ending_time then do;
nextRid = i;
output;
end;
else do;
prevB = Beginning_time;
prevE = Ending_time;
prevRid = i;
end;
end;
end;
run;
The output is:
Hi @RVA,
My previous program assumes that the overlapping records are placed one-after the other. What if the overlapping records are randomly placed?
If the overlapping record happens to fall anywhere (not to next) then my previous code can be revised using POINT= option of SET statement. The algorithm is to to select the i-th record and compare it with all records starting from i+1, until the end of the data set. If there is a overlap, then output the Record IDs of both the records.
I have reordered some of the overlapping records as shown below:
CHEM100 001 MWF 13:00 13:45
CHEM100 007 MWF 13:35 13:40
CHEM100 010 MWF 14:00 16:00
CHEM100 021 MWF 9:00 10:00
CHEM100 025 MWF 9:30 10:00
CHEM100 007 MWF 13:15 13:45
CHEM100 010 MWF 15:00 15:30
The revised program:
data want;
do i = 1 to n - 1;
set have nobs = n point = i;
prevB = Beginning_time;
prevE = Ending_time;
prevRid = i;
do j = i + 1 to n;
set have point = j;
if prevB <= Beginning_time and prevE >= Ending_time then do;
nextRid = j;
output;
end;
end;
end;
stop;
run;
The OUPUT Is:
Hi @RVA
Here is my attempt. Let me know!
data have;
input Course_section $ 1-12 Meeting_days $ 14-16 Beginning_time Ending_time;
informat Beginning_time Ending_time time10.;
format Beginning_time Ending_time time10.;
datalines;
CHEM 100 001 MWF 13:00 13:45
CHEM 100 007 MWF 13:15 13:45
CHEM 100 010 MWF 14:00 16:00
CHEM 100 021 MWF 9:00 10:00
;
run;
proc sort data=have;
by Meeting_days Beginning_time;
run;
data have2;
set have;
by Meeting_days;
format _lag time10.;
_lag = lag(Ending_time);
if first.Meeting_days then _lag=.;
if _lag >= Beginning_time then flag = 1;
else flag=0;
drop _lag;
run;
data want;
merge have2 have2 (firstobs=2 keep=flag rename=(flag=flag_lead));
if flag=0 and flag_lead=1 then flag=1;
drop flag_lead;
run;
Output:
Dear all
Thank you so much for your help. Let me plug your scripts in and run them on my data. I'll get back if I have any questions.
Much appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.