DATA Step, Macro, Functions and more

finding time periods that overlap

Reply
Contributor
Posts: 23

finding time periods that overlap

Hello all!

 

I have a dataset in which I would like to be able to find time periods that overlap for each id.

 

Below is an example of the start dataset:

 

id      start_date       end_date

1       3/1/2016        6/2/2016

1       4/1/2016        7/1/2016

1       8/2/2016        12/1/2016

1       1/2/2017        3/1/2017

2       5/10/2014      12/1/2014

2       7/10/2014      10/1/2014

2       1/1/2015        3/2/2015

 

I want to be able to identify the rows for each id with the overlapping periods, like below:

 

id      start_date       end_date

1       3/1/2016        6/2/2016

1       4/1/2016        7/1/2016

 

2       5/10/2014      12/1/2014

2       7/10/2014      10/1/2014

 

Can anyone guide me on this?

Thank you!

Regular Contributor
Posts: 165

Re: finding time periods that overlap

there is discussion of overlap here: http://www2.sas.com/proceedings/sugi31/048-31.pdf

they use arrays

--------------
blog: papersandprograms.com
Respected Advisor
Posts: 4,741

Re: finding time periods that overlap

A SQL self-join makes this sort of problem often quite simple.

data have;
  infile datalines dlm=' ' truncover;
  input id start_date:mmddyy. end_date:mmddyy.;
  format start_date end_date date9.;
  datalines;
1 3/1/2016 6/2/2016
1 4/1/2016 7/1/2016
1 8/2/2016 12/1/2016
1 1/2/2017 3/1/2017
2 5/10/2014 12/1/2014
2 7/10/2014 10/1/2014
2 1/1/2015 3/2/2015
;
run;

proc sql;
create table want as select l.* from have l, have r where /* test only within same id */ l.id=r.id /* do not join a record with itself */ and not (l.start_date=r.start_date and l.end_date=r.end_date) /* find overlapping dates */ and ( l.start_date between r.start_date and r.end_date or r.start_date between l.start_date and l.end_date ) ; quit;
Super User
Posts: 10,787

Re: finding time periods that overlap

data have;
  infile datalines dlm=' ' truncover;
  input id start_date:mmddyy. end_date:mmddyy.;
  format start_date end_date date9.;
  datalines;
1 3/1/2016 6/2/2016
1 4/1/2016 7/1/2016
1 8/2/2016 12/1/2016
1 1/2/2017 3/1/2017
2 5/10/2014 12/1/2014
2 7/10/2014 10/1/2014
2 1/1/2015 3/2/2015
;
run;
data temp;
 set have;
 by id;
 if start_date <= lag(end_date) then flag=1;
 if first.id then flag=.;
run;
data temp;
 set temp;
 if missing(flag) then group+1;
run;
proc sql;
create table want as
 select *
  from temp
   group by group
    having sum(flag) > 0;
quit;
Ask a Question
Discussion stats
  • 3 replies
  • 80 views
  • 1 like
  • 4 in conversation