Hi Community,
Can anyone help to get only overlapping records.
data have;
input EVENT_ID 6. EVENT_START_DATE DATETIME19. EVENT_END_DATE DATETIME19. ID 2.;
FORMAT EVENT_START_DATE EVENT_END_DATE DATETIME19. ;
datalines;
048110 10Jun2021:13:00:00 10Jun2021:14:00:00 6
049076 10Jun2021:14:00:00 10Jun2021:15:00:00 6
026365 20Sep2019:14:00:00 21Sep2019:18:00:00 9
026361 20Sep2019:16:30:00 20Sep2019:19:00:00 9
037522 25Sep2020:13:00:00 26Sep2020:11:30:00 9
035367 25Sep2020:16:30:00 25Sep2020:20:00:00 9
027322 20Sep2019:13:00:00 20Sep2019:17:00:00 2
026361 20Sep2019:16:30:00 20Sep2019:19:00:00 2
;
run;
For id 6 wanted to remove those as there is no overlap between first event_start_date and second event_start_date or event_date. For id 2 there is half an hour overlap between two events (event 027322 started at 20Sep2019 13:00:00 and ended at 20Sep2019 17:00:00 and for same id started second event before first event ended(over lap of half an hour) .
Want :
EVENT_ID | EVENT_START_DATE | EVENT_END_DATE | ID |
026365 | 20Sep2019 14:00:00 | 21Sep2019 18:00:00 | 9 |
026361 | 20Sep2019 16:30:00 | 20Sep2019 19:00:00 | 9 |
037522 | 25Sep2020 13:00:00 | 26Sep2020 11:30:00 | 9 |
035367 | 25Sep2020 16:30:00 | 25Sep2020 20:00:00 | 9 |
027322 | 20Sep2019 13:00:00 | 20Sep2019 17:00:00 | 2 |
026361 | 20Sep2019 16:30:00 | 20Sep2019 19:00:00 | 2 |
A slightly different take, using the basically same "look ahead" technique:
data want;
merge
have
have (
firstobs=2
keep=id event_start_date
rename=(id=_id event_start_date=_start)
)
;
if
id = lag(id) and event_start_date < lag(event_end_date)
or
id = _id and event_end_date > _start
;
drop _:;
run;
@mkeintz if the dataset is sorted by id and event_start_date (or at least sorted by date within a id group), then an overlap in the third (or later) obs would force an overlap in the previous obs.
Please show what your desired output dataset should look like.
And please provide your available data in a SAS data step format.
Thanks for setting up the data for ready code testing. This program produces what you want:
data want (drop=_:);
set have (keep=id);
by id notsorted;
merge have
have (firstobs=2 keep=event_start_date rename=(event_start_date=_nxt_evt));
if last.id=0 and _nxt_evt<event_end_date
or
first.id=0 and event_start_date<lag(event_end_date);
run;
The program assumes:
Number 3 is important because if you had three records, in which the 2nd and 3rd records don't overlap each other, but both overlap or are contained in the first, this code would not include the third record. Is that condition possible in your data?
A slightly different take, using the basically same "look ahead" technique:
data want;
merge
have
have (
firstobs=2
keep=id event_start_date
rename=(id=_id event_start_date=_start)
)
;
if
id = lag(id) and event_start_date < lag(event_end_date)
or
id = _id and event_end_date > _start
;
drop _:;
run;
@mkeintz if the dataset is sorted by id and event_start_date (or at least sorted by date within a id group), then an overlap in the third (or later) obs would force an overlap in the previous obs.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.