turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- how to determine if one date range is a subset of ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 12:12 PM - edited 12-19-2016 01:10 PM

Hi, I have the following dataset:

Obs type enr_start enr_date

1 A 14Jul2002 23Jul2002

2 A 01Jan2004 09Dec2005

3 A 26Feb2004 06Apr2004

4 B 11May2003 10Aug2003

5 B 04Dec2002 24Nov2003

6 C 26Sep2007 19Feb2009

7 C 26Sep2010 01Jan2010

What I want is to determine which of these observations with the same type have a date range which is a subset of another (in this instance, obs 3 is a subset of obs 2, and obs 4 is a subset of obs 5) and output them into a different dataset. How would I go about doing this? My output would look like this:

type enr_start enr_date

A 26Feb2004 06Apr2004

B 11May2003 10Aug2003

Accepted Solutions

Solution

03-07-2017
01:11 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-20-2016 05:29 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 12:47 PM - edited 12-19-2016 02:41 PM

I think you may need to either expand on your description or correct your example data. You say " with the same type have a date range which is a subset of another" but observations 3 and 4 are not of the same type. It looks like 3 MAY be a subset of **2** though.

By subset do you mean that the enr_start has to be after the other observation and the enr_date less than or could one or both ends be the same?

What should your output data set look like?

Here's one stab that allows the dates to be equal

proc sql; create table subsets as select a.*, b.enr_start as substart, b.enr_date as subdate from have as a left join have as b on a.type=b.type where a.enr_start le b.enr_start and a.enr_date ge b.enr_date; quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 01:07 PM

Hello, thank you for your reply! What I meant was that obs 3 is a subset of obs 2, and obs 4 is a subset of obs 5. I need to output obs 3 and obs 4 into a new dataset. Does this make sense?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 02:41 PM

Still haven't addressed the endpoints at same time issue but I think this should work:

proc sql; create table subsets as select a.* from have as a left join have as b on a.type=b.type where a.enr_start ge b.enr_start and a.enr_date le b.enr_date; quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 02:59 PM

The proc sql code above is giving me this output dataset:

type enr_start enr_date

A 26Feb2004 06Apr2004

A 14Jul2002 23Jul2002

A 26Feb2004 06Apr2004

A 01Jan2004 09Dec2005

B 04Dec2002 24Nov2003

B 11May2003 10Aug2003

B 11May2003 10Aug2003

C 26Sep2010 01Jan2010

C 26Sep2007 19Feb2009

What I'm looking for is this output:

type enr_start enr_date

A 26Feb2004 06Apr2004

B 11May2003 10Aug2003

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 03:16 PM - edited 12-19-2016 03:22 PM

Also, I said that you need to address the endpoints. You have not explicitly said whether endpoints that are equal are considered "subset" or not. if not use GT LT instead of GE or LE. Also your example data did not include multiple identical ranges. Use SELECT DISTINCT to get single results.

proc sql; create table subsets as select distinct a.* from have as a left join have as b on a.type=b.type where a.enr_start gt b.enr_start and a.enr_date lt b.enr_date; quit;

Follow the instruction here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to post your example data in the form of a datastep so we can recreate data to test code against.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 03:27 PM

This is the code I used, does this help?

data have;

input type$ (enr_start enr_date) (: date9.) ;

format enr_start enr_date date9.;

cards;

A 14Jul2002 23Jul2002

A 01Jan2004 09Dec2005

A 26Feb2004 06Apr2004

B 11May2003 10Aug2003

B 04Dec2002 24Nov2003

C 26Sep2007 19Feb2009

C 26Sep2010 01Jan2010

;

run;

proc sql;

create table subsets as

select a.*

from have as a left join have as b

on a.type=b.type

where a.enr_start ge b.enr_start and a.enr_date le b.enr_date;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 11:36 PM

The problem is that every date range "contains" itself - so you got the entire dataset as your "subset". You avoid the problem by modifying the code submitted by @ballardw:

**proc** **sql**;

create table subsets as

select a.*

from have as a left join have as b

on a.type=b.type

where (a.enr_start ge b.enr_start and a.enr_date lt b.enr_date)

or (a.enr_start gt b.enr_start and a.enr_date le b.enr_date)

;

**quit**;

This effectively addresses @ballardw's question about end-points, by requiring that a "contained" date range not reach both extremes of its container. Not a problem if your dataset has no duplicate date ranges within a single TYPE.

Solution

03-07-2017
01:11 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-20-2016 05:29 AM