BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mt88
Calcite | Level 5

Hello everyone,

 

I have a data set with repeated observations that have associated dates. For each ID, I'd like to either exclude observations based on pre-set date ranges or create a new variable that indicates which observations are in the pre-set date range and which are not. For example, let's say I start out with this:

IDDate
1Jan 1 2019
1Feb 3 2019
1Mar 15 2019
1Apr 22 2019
2Mar 20 2017
2Apr 1 2017
2May 15 2018
2Jun 12 2018
2Jul 23 2019
3Jan 15 2019
3Feb 12 2019
3Mar 13 2020
3May 7 2020

and I'd like to restrict the date ranges by ID:

  • For ID 1 - restricted date range: Feb 1 2019 to Mar 30 2019
  • For ID 2 - restricted date range: Mar 30 2017 to Jun 30 2018
  • For ID 3 - restricted date range: Jan 20 2019 to Mar 25 2020

I'd like to either end up with data that look like this:

IDDate
1Feb 3 2019
1Mar 15 2019
2Apr 1 2017
2May 15 2018
2Jun 12 2018
3Feb 12 2019
3Mar 13 2020

or this:

IDDateindicator
1Jan 1 20190
1Feb 3 20191
1Mar 15 20191
1Apr 22 20190
2Mar 20 20170
2Apr 1 20171
2May 15 20181
2Jun 12 20181
2Jul 23 20190
3Jan 15 20190
3Feb 12 20191
3Mar 13 20201
3May 7 20200

Any help would be appreciated. 

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

My code on your posted sample data:

 

data a;
input ID Date :mmddyy10.;
format Date mmddyy10.;
datalines;
204356 02/13/2019
204356 03/17/2019
204356 04/20/2019
204356 07/20/2019
204356 09/22/2019
204356 10/19/2019
205352 10/01/2017
205756 01/17/2018
205756 02/07/2018
205756 04/11/2018
205756 05/07/2018
205756 04/27/2018
205756 05/14/2018
205756 07/06/2018
205756 10/15/2018
205756 10/05/2018
205756 10/29/2018
205756 01/18/2019
205756 02/25/2019
205756 05/15/2019
205756 08/18/2019
205756 09/04/2019
205756 11/16/2019
;

data b;
input ID Date :mmddyy10.;
format Date mmddyy10.;
datalines;
204356 02/08/2019
204356 03/22/2019
204356 04/25/2019
204356 07/25/2019
204356 09/27/2019
204356 10/24/2019
205352 10/06/2017
205756 01/22/2018
205756 02/12/2018
205756 04/16/2018
205756 05/02/2018
205756 05/02/2018
205756 05/09/2018
205756 07/11/2018
205756 10/10/2018
205756 10/10/2018
205756 10/24/2018
205756 01/23/2019
205756 02/20/2019
205756 05/20/2019
205756 08/13/2019
205756 09/09/2019
205756 11/11/2019
;

data want (keep = ID Date);
   do until (last.ID);
      set a;
      by ID;
      if first.ID then from = Date;
      if last.ID then to = Date;
   end;

   do until (last.ID);
      set b;
      by ID;
      if from <= Date <= to then output;
   end;
run;

Result:

 

ID      Date 
204356  03/22/2019 
204356  04/25/2019 
204356  07/25/2019 
204356  09/27/2019 
205756  01/22/2018 
205756  02/12/2018 
205756  04/16/2018 
205756  05/02/2018 
205756  05/02/2018 
205756  05/09/2018 
205756  07/11/2018 
205756  10/10/2018 
205756  10/10/2018 
205756  10/24/2018 
205756  01/23/2019 
205756  02/20/2019 
205756  05/20/2019 
205756  08/13/2019 
205756  09/09/2019 
205756  11/11/2019 

View solution in original post

20 REPLIES 20
PeterClemmensen
Tourmaline | Level 20

How do you know the date ranges for each ID? Are they in a separate data set or?

PaigeMiller
Diamond | Level 26
  • For ID 1 - restricted date range: Feb 1 2019 to Mar 30 2019
  • For ID 2 - restricted date range: Mar 30 2017 to Jun 30 2018
  • For ID 3 - restricted date range: Jan 20 2019 to Mar 25 2020

What is the logic that determines these dates for these IDs? Or do you want to hard-code the dates for each ID?

--
Paige Miller
mt88
Calcite | Level 5

Hi everyone,

 

I have a separate data set (call it 'set A') for these subjects that has a different set of dates for each ID The first and last date for each ID in 'set A' is the basis for the date range). I guess another way of thinking of it is that, into 'set A', I'd like to merge in only those observations from the current dataset that are within 'set A's' first-to-last date range for each ID. 

 

Thanks

PaigeMiller
Diamond | Level 26

Something like this:

 

data want;
    merge dataset1 a;
    by id;
    if first_date<=date<=last_date;
    drop first_date last_date;
run;
--
Paige Miller
mt88
Calcite | Level 5

Thanks, but the date range is unique for each ID. It doesn't seem like this code captures that aspect.

PaigeMiller
Diamond | Level 26

@mt88 wrote:

Thanks, but the date range is unique for each ID. It doesn't seem like this code captures that aspect.


It does capture that aspect.

--
Paige Miller
mt88
Calcite | Level 5
If I have different date ranges for each ID, what date would I insert into that code?
PaigeMiller
Diamond | Level 26

@mt88 wrote:
If I have different date ranges for each ID, what date would I insert into that code?

You said earlier:

 

I have a separate data set (call it 'set A') for these subjects that has a different set of dates for each ID The first and last date for each ID in 'set A' is the basis for the date range). 

You don't insert dates into the code, they are already in the data set A.

 

Some of this confusion could be avoided if you provided a portion of the actual data as requested for both data sets, in the form requested by @Kurt_Bremser as illustrated by the data provided by @PeterClemmensen 

--
Paige Miller
Kurt_Bremser
Super User

Since you did not show data that contains these ranges, it is practically impossible to come up with code that will work for you.

Please post that data in usable form (data step with datalines, see @PeterClemmensen 's post), so we can immediately start to work with it.

PeterClemmensen
Tourmaline | Level 20

How about

 

data have;
input ID Date :date9.;
format Date date9.;
datalines;
1 01Jan2019
1 03Feb2019
1 15Mar2019
1 22Apr2019
2 20Mar2017
2 01Apr2017
2 15May2018
2 12Jun2018
2 23Jul2019
3 15Jan2019
3 12Feb2019
3 13Mar2020
3 07May2020
;

data a;
input ID Date :date9.;
format Date date9.;
datalines;
1 01Feb2019
1 02Feb2019
1 30Mar2019
2 30Mar2017
2 01Jun2018
2 30Jun2018
3 20Jan2019
3 21Jan2019
3 25Mar2020
;

data want (keep = ID Date);
   do until (last.ID);
      set a;
      by ID;
      if first.ID then from = Date;
      if last.ID then to = Date;
   end;

   do until (last.ID);
      set have;
      by ID;
      if from <= Date <= to then output;
   end;
run;

Result:

 

ID Date 
1  03FEB2019 
1  15MAR2019 
2  01APR2017 
2  15MAY2018 
2  12JUN2018 
3  12FEB2019 
3  13MAR2020 
PeterClemmensen
Tourmaline | Level 20

Or in case you want the indicator variable.

 

Btw, I assumed the form of your 'a' data set.

 

data want (keep = ID Date indicator);
   do until (last.ID);
      set a;
      by ID;
      if first.ID then from = Date;
      if last.ID then to = Date;
   end;

   do until (last.ID);
      set have;
      by ID;
      indicator = (from <= Date <= to);
      output;
   end;
run;

Result:

 

ID  Date       indicator 
1   01JAN2019  0 
1   03FEB2019  1 
1   15MAR2019  1 
1   22APR2019  0 
2   20MAR2017  0 
2   01APR2017  1 
2   15MAY2018  1 
2   12JUN2018  1 
2   23JUL2019  0 
3   15JAN2019  0 
3   12FEB2019  1 
3   13MAR2020  1 
3   07MAY2020  0 
mt88
Calcite | Level 5

Hi everyone,

 

I've attached two excel files. Each contains a list of repeated IDs and associated dates. Dataset A is the set that contains the relevant date ranges for each ID. In other words, the first and last dates for each ID would be the date ranges for each ID. Dataset B has a different set of dates for each ID. Earlier, I had said I wanted to merge, but I think that was a mistake. What I'd like is to limit or identify observations for each ID from Dataset B based on the date ranges from Dataset A. Perhaps draycut's response works for this. Let me know if you need anything else. 

 

Thanks

PaigeMiller
Diamond | Level 26

Many people, including myself, will not download attachments.

 

Furthermore, the format of the data sets should be that shown by @PeterClemmensen , documentation here

That's the best format to provide data. Help us out by providing data in this format.

--
Paige Miller
mt88
Calcite | Level 5

My datasets were too large to copy and paste so hopefully this smaller set works:

 

data A;

input ID Date :date9.;

format Date date9.;

datalines;

 

IDDate 
20435602/13/2019
20435603/17/2019
20435604/20/2019
20435607/20/2019
20435609/22/2019
20435610/19/2019
20535210/01/2017
20575601/17/2018
20575602/07/2018
20575604/11/2018
20575605/07/2018
20575604/27/2018
20575605/14/2018
20575607/06/2018
20575610/15/2018
20575610/05/2018
20575610/29/2018
20575601/18/2019
20575602/25/2019
20575605/15/2019
20575608/18/2019
20575609/04/2019
20575611/16/2019

 

data B;

input ID Date :date9.;

format Date date9.;

datalines;

 

IDDate
20435602/08/2019
20435603/22/2019
20435604/25/2019
20435607/25/2019
20435609/27/2019
20435610/24/2019
20535210/06/2017
20575601/22/2018
20575602/12/2018
20575604/16/2018
20575605/02/2018
20575605/02/2018
20575605/09/2018
20575607/11/2018
20575610/10/2018
20575610/10/2018
20575610/24/2018
20575601/23/2019
20575602/20/2019
20575605/20/2019
20575608/13/2019
20575609/09/2019
20575611/11/2019

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2043 views
  • 1 like
  • 4 in conversation