Help using Base SAS procedures

Identifying overlapping observations

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Identifying overlapping observations

Hello, I have the following data:

Type      Start_Date          End_Date

A          14Jul2002            23Jul2002

P          10Dec2002          09Dec2003

P          26Feb2003          09Dec2003

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

Observations 2 and 3 are the same type and have overlapping date ranges. How can I identify them as 'related' based on type and overlapping date ranges?

Thanks in advance Smiley Happy


Accepted Solutions
Solution
‎02-18-2015 11:33 AM
Respected Advisor
Posts: 3,124

Re: Identifying overlapping observations

Another alternative:

data have;

     input type $ (start_date end_date) (:date9.);

     format start_date end_date date9.;

     cards;

A          14Jul2002            23Jul2002

P          10Dec2002          09Dec2003

P          26Feb2003          09Dec2003

P          11Nov2003          10Jan2004

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

R          26sep2007          01jan2010

;

proc sort data=have;

     by type start_date end_date;

run;

data want;

     do until (last.type);

           set have;

           by type;

           retain _start _end;

           format _start _end date9.;

           if first.type then

                do;

                     _start=start_date;

                     _end=end_date;

                end;

           if start_date > _end then

                do;

                     output;

                     _start=start_date;

                     _end=end_date;

                end;

           else  _end=max(_end, end_date);

     end;

     output;

     drop start_date end_date;

     rename _start=start_date _end=end_date;

run;

View solution in original post


All Replies
Super User
Posts: 10,538

Re: Identifying overlapping observations

What do you expect the output data to look like?

The most likely tool involves either LAG or DIF functions in a datastep.

What would the output look like if there were 3 or more sequential over laps such as

P  11Nov2003  10Jan2004

Is your data sorted by type and start_date?

Occasional Contributor
Posts: 7

Re: Identifying overlapping observations

At the end of the day, I would want 'overlapping' observations to appear as just one using the earliest start date and the latest end date.

If I add in your additional observation:

Original Data:

Type      Start_Date          End_Date

A          14Jul2002            23Jul2002

P          10Dec2002          09Dec2003

P          26Feb2003          09Dec2003

P          11Nov2003          10Jan2004

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

Final Result:

A          14Jul2002            23Jul2002

P          10Dec2002          10Jan2004

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

And yes it is sorted by type and start_date.

Thank-you.

PROC Star
Posts: 7,364

Re: Identifying overlapping observations

Christy: I think that the following does what you want to accomplish:

data have;

input type $ (start_date end_date) (:date9.);

format start_date end_date date9.;

cards;

A          14Jul2002            23Jul2002

P          10Dec2002          09Dec2003

P          26Feb2003          09Dec2003

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

;

data want;

   set have (rename=(start_date=s_dt end_date=e_dt));

     by type;

  retain start_date end_date;

  format start_date end_date date9.;

      if first.type then do;

        start_date=s_dt;

        end_date=e_dt;

      end;

      if s_dt <= lag(e_dt) then end_date=e_dt;

   else if not first.type then do;

     output;

     start_date=s_dt;

     end_date=e_dt;

   end;

   if last.type then output;

   drop s_dt e_dt;

run;

Solution
‎02-18-2015 11:33 AM
Respected Advisor
Posts: 3,124

Re: Identifying overlapping observations

Another alternative:

data have;

     input type $ (start_date end_date) (:date9.);

     format start_date end_date date9.;

     cards;

A          14Jul2002            23Jul2002

P          10Dec2002          09Dec2003

P          26Feb2003          09Dec2003

P          11Nov2003          10Jan2004

P          04May2004          24Aug2004

R          26Sep2007          19Feb2009

R          26sep2007          01jan2010

;

proc sort data=have;

     by type start_date end_date;

run;

data want;

     do until (last.type);

           set have;

           by type;

           retain _start _end;

           format _start _end date9.;

           if first.type then

                do;

                     _start=start_date;

                     _end=end_date;

                end;

           if start_date > _end then

                do;

                     output;

                     _start=start_date;

                     _end=end_date;

                end;

           else  _end=max(_end, end_date);

     end;

     output;

     drop start_date end_date;

     rename _start=start_date _end=end_date;

run;

Occasional Contributor
Posts: 7

Re: Identifying overlapping observations

Thank you very much for the responses! I was out of the office yesterday but will be trying out your suggestions today. I really appreciate the input.

Cheers!

Occasional Contributor
Posts: 8

Re: Identifying overlapping observations

Thank you so much for the code! But then I was testing out the code, the result I got was:

 

A 14Jul2002 23Jul2002

P 10Dec2002 24Aug2004

R 26Sep2007 01Jan2010

 

but the actual result that I want is 

 

A 14Jul2002 23Jul2002

P 10Dec2002  01Jan2004

P 04May2004 24Aug2004

R 26Sep2007 01Jan2010

 

 

Since there is no overlap between 01Jan 2004 and 04May 2004, I would like to show this gap in the final result. Will you mind letting me know how to fix this in the code? Thank you so much!

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1649 views
  • 0 likes
  • 5 in conversation