how to determine if one date range is a subset of another

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

how to determine if one date range is a subset of another

[ Edited ]

Hi, I have the following dataset:

 

Obs    type         enr_start          enr_date
1         A             14Jul2002        23Jul2002
2         A             01Jan2004      09Dec2005
3         A             26Feb2004      06Apr2004
4         B             11May2003     10Aug2003
5         B             04Dec2002     24Nov2003
6         C             26Sep2007     19Feb2009
7         C             26Sep2010     01Jan2010

 

What I want is to determine which of these observations with the same type have a date range which is a subset of another (in this instance, obs 3 is a subset of obs 2, and obs 4 is a subset of obs 5) and output them into a different dataset. How would I go about doing this? My output would look like this:

 

  type         enr_start          enr_date
  A             26Feb2004      06Apr2004
  B             11May2003     10Aug2003

 


Accepted Solutions
Solution
‎03-07-2017 01:11 PM
Super User
Posts: 10,046

Re: how to determine if one date range is a subset of another

Using Cartesian Product , not left join.


 
data have;
input type$ (enr_start enr_date) (: date9.) ;
format enr_start enr_date date9.;
cards;
A 14Jul2002 23Jul2002
A 01Jan2004 09Dec2005
A 26Feb2004 06Apr2004
B 11May2003 10Aug2003
B 04Dec2002 24Nov2003
C 26Sep2007 19Feb2009
C 26Sep2010 01Jan2010
;
run;

proc sql;
create table subsets as
select a.*
from have as a inner join have as b
on a.type=b.type
and a.enr_start gt b.enr_start and a.enr_date lt b.enr_date;
quit;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: how to determine if one date range is a subset of another

[ Edited ]

I think you may need to either expand on your description or correct your example data. You say " with the same type have a date range which is a subset of another" but observations 3 and 4 are not of the same type. It looks like 3 MAY be a subset of 2 though.

By subset do you mean that the enr_start has to be after the other observation and the enr_date less than or could one or both ends be the same?

 

What should your output data set look like?

 

Here's one stab that allows the dates to be equal

proc sql;
   create table subsets as
   select a.*, b.enr_start as substart, b.enr_date as subdate
   from have as a left join have as b 
        on a.type=b.type
   where a.enr_start le b.enr_start and a.enr_date ge b.enr_date;
quit;
Occasional Contributor
Posts: 5

Re: how to determine if one date range is a subset of another

Hello, thank you for your reply! What I meant was that obs 3 is a subset of obs 2, and obs 4 is a subset of obs 5.  I need to output obs 3 and obs 4 into a new dataset. Does this make sense?

Super User
Posts: 11,343

Re: how to determine if one date range is a subset of another

Still haven't addressed the endpoints at same time issue but I think this should work:

 

proc sql;
   create table subsets as
   select a.*
   from have as a left join have as b 
        on a.type=b.type
   where a.enr_start ge b.enr_start and a.enr_date le b.enr_date;
quit;
Occasional Contributor
Posts: 5

Re: how to determine if one date range is a subset of another

The proc sql code above is giving me this output dataset:

 

 type         enr_start          enr_date

A             26Feb2004       06Apr2004
A             14Jul2002        23Jul2002
A             26Feb2004      06Apr2004

A             01Jan2004      09Dec2005

B             04Dec2002     24Nov2003
B             11May2003     10Aug2003
B             11May2003     10Aug2003

C             26Sep2010     01Jan2010
C             26Sep2007     19Feb2009

What I'm looking for is this output:

 

 type         enr_start          enr_date

A             26Feb2004     06Apr2004

B             11May2003     10Aug2003

Super User
Posts: 11,343

Re: how to determine if one date range is a subset of another

[ Edited ]

Also, I said that you need to address the endpoints. You have not explicitly said whether endpoints that are equal are considered "subset" or not. if not use GT LT instead of GE or LE. Also your example data did not include multiple identical ranges. Use SELECT DISTINCT to get single results.

 

 

proc sql;
   create table subsets as
   select distinct a.*
   from have as a left join have as b 
        on a.type=b.type
   where a.enr_start gt b.enr_start and a.enr_date lt b.enr_date;
quit;

 

Follow the instruction here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to post your example data in the form of a datastep so we can recreate data to test code against.

 

 

Occasional Contributor
Posts: 5

Re: how to determine if one date range is a subset of another

This is the code I used, does this help?

 

data have;
input type$ (enr_start enr_date) (: date9.) ;
format enr_start enr_date date9.;
cards;
A 14Jul2002 23Jul2002
A 01Jan2004 09Dec2005
A 26Feb2004 06Apr2004
B 11May2003 10Aug2003
B 04Dec2002 24Nov2003
C 26Sep2007 19Feb2009
C 26Sep2010 01Jan2010
;
run;


proc sql;
create table subsets as
select a.*
from have as a left join have as b
on a.type=b.type
where a.enr_start ge b.enr_start and a.enr_date le b.enr_date;
quit;

Trusted Advisor
Posts: 1,022

Re: how to determine if one date range is a subset of another

The problem  is that every date range "contains" itself - so you got the entire dataset as your "subset". You avoid the problem by modifying the code submitted by @ballardw:

 

proc sql;

  create table subsets as

  select a.*

  from have as a left join have as b

  on a.type=b.type

    where (a.enr_start ge b.enr_start and a.enr_date lt b.enr_date)

       or (a.enr_start gt b.enr_start and a.enr_date le b.enr_date)

  ;

quit;

 

This effectively addresses @ballardw's question about end-points, by requiring that a "contained" date range not reach both extremes of its container. Not a problem if your dataset has no duplicate date ranges within a single TYPE.

Solution
‎03-07-2017 01:11 PM
Super User
Posts: 10,046

Re: how to determine if one date range is a subset of another

Using Cartesian Product , not left join.


 
data have;
input type$ (enr_start enr_date) (: date9.) ;
format enr_start enr_date date9.;
cards;
A 14Jul2002 23Jul2002
A 01Jan2004 09Dec2005
A 26Feb2004 06Apr2004
B 11May2003 10Aug2003
B 04Dec2002 24Nov2003
C 26Sep2007 19Feb2009
C 26Sep2010 01Jan2010
;
run;

proc sql;
create table subsets as
select a.*
from have as a inner join have as b
on a.type=b.type
and a.enr_start gt b.enr_start and a.enr_date lt b.enr_date;
quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 193 views
  • 0 likes
  • 4 in conversation