Hello everyone,
I have a data set with repeated observations that have associated dates. For each ID, I'd like to either exclude observations based on pre-set date ranges or create a new variable that indicates which observations are in the pre-set date range and which are not. For example, let's say I start out with this:
ID | Date |
1 | Jan 1 2019 |
1 | Feb 3 2019 |
1 | Mar 15 2019 |
1 | Apr 22 2019 |
2 | Mar 20 2017 |
2 | Apr 1 2017 |
2 | May 15 2018 |
2 | Jun 12 2018 |
2 | Jul 23 2019 |
3 | Jan 15 2019 |
3 | Feb 12 2019 |
3 | Mar 13 2020 |
3 | May 7 2020 |
and I'd like to restrict the date ranges by ID:
I'd like to either end up with data that look like this:
ID | Date |
1 | Feb 3 2019 |
1 | Mar 15 2019 |
2 | Apr 1 2017 |
2 | May 15 2018 |
2 | Jun 12 2018 |
3 | Feb 12 2019 |
3 | Mar 13 2020 |
or this:
ID | Date | indicator |
1 | Jan 1 2019 | 0 |
1 | Feb 3 2019 | 1 |
1 | Mar 15 2019 | 1 |
1 | Apr 22 2019 | 0 |
2 | Mar 20 2017 | 0 |
2 | Apr 1 2017 | 1 |
2 | May 15 2018 | 1 |
2 | Jun 12 2018 | 1 |
2 | Jul 23 2019 | 0 |
3 | Jan 15 2019 | 0 |
3 | Feb 12 2019 | 1 |
3 | Mar 13 2020 | 1 |
3 | May 7 2020 | 0 |
Any help would be appreciated.
Thank you.
My code on your posted sample data:
data a;
input ID Date :mmddyy10.;
format Date mmddyy10.;
datalines;
204356 02/13/2019
204356 03/17/2019
204356 04/20/2019
204356 07/20/2019
204356 09/22/2019
204356 10/19/2019
205352 10/01/2017
205756 01/17/2018
205756 02/07/2018
205756 04/11/2018
205756 05/07/2018
205756 04/27/2018
205756 05/14/2018
205756 07/06/2018
205756 10/15/2018
205756 10/05/2018
205756 10/29/2018
205756 01/18/2019
205756 02/25/2019
205756 05/15/2019
205756 08/18/2019
205756 09/04/2019
205756 11/16/2019
;
data b;
input ID Date :mmddyy10.;
format Date mmddyy10.;
datalines;
204356 02/08/2019
204356 03/22/2019
204356 04/25/2019
204356 07/25/2019
204356 09/27/2019
204356 10/24/2019
205352 10/06/2017
205756 01/22/2018
205756 02/12/2018
205756 04/16/2018
205756 05/02/2018
205756 05/02/2018
205756 05/09/2018
205756 07/11/2018
205756 10/10/2018
205756 10/10/2018
205756 10/24/2018
205756 01/23/2019
205756 02/20/2019
205756 05/20/2019
205756 08/13/2019
205756 09/09/2019
205756 11/11/2019
;
data want (keep = ID Date);
do until (last.ID);
set a;
by ID;
if first.ID then from = Date;
if last.ID then to = Date;
end;
do until (last.ID);
set b;
by ID;
if from <= Date <= to then output;
end;
run;
Result:
ID Date 204356 03/22/2019 204356 04/25/2019 204356 07/25/2019 204356 09/27/2019 205756 01/22/2018 205756 02/12/2018 205756 04/16/2018 205756 05/02/2018 205756 05/02/2018 205756 05/09/2018 205756 07/11/2018 205756 10/10/2018 205756 10/10/2018 205756 10/24/2018 205756 01/23/2019 205756 02/20/2019 205756 05/20/2019 205756 08/13/2019 205756 09/09/2019 205756 11/11/2019
How do you know the date ranges for each ID? Are they in a separate data set or?
For ID 1 - restricted date range: Feb 1 2019 to Mar 30 2019
For ID 2 - restricted date range: Mar 30 2017 to Jun 30 2018
For ID 3 - restricted date range: Jan 20 2019 to Mar 25 2020
What is the logic that determines these dates for these IDs? Or do you want to hard-code the dates for each ID?
Hi everyone,
I have a separate data set (call it 'set A') for these subjects that has a different set of dates for each ID The first and last date for each ID in 'set A' is the basis for the date range). I guess another way of thinking of it is that, into 'set A', I'd like to merge in only those observations from the current dataset that are within 'set A's' first-to-last date range for each ID.
Thanks
Something like this:
data want;
merge dataset1 a;
by id;
if first_date<=date<=last_date;
drop first_date last_date;
run;
Thanks, but the date range is unique for each ID. It doesn't seem like this code captures that aspect.
@mt88 wrote:
Thanks, but the date range is unique for each ID. It doesn't seem like this code captures that aspect.
It does capture that aspect.
@mt88 wrote:
If I have different date ranges for each ID, what date would I insert into that code?
You said earlier:
I have a separate data set (call it 'set A') for these subjects that has a different set of dates for each ID The first and last date for each ID in 'set A' is the basis for the date range).
You don't insert dates into the code, they are already in the data set A.
Some of this confusion could be avoided if you provided a portion of the actual data as requested for both data sets, in the form requested by @Kurt_Bremser as illustrated by the data provided by @PeterClemmensen
Since you did not show data that contains these ranges, it is practically impossible to come up with code that will work for you.
Please post that data in usable form (data step with datalines, see @PeterClemmensen 's post), so we can immediately start to work with it.
How about
data have;
input ID Date :date9.;
format Date date9.;
datalines;
1 01Jan2019
1 03Feb2019
1 15Mar2019
1 22Apr2019
2 20Mar2017
2 01Apr2017
2 15May2018
2 12Jun2018
2 23Jul2019
3 15Jan2019
3 12Feb2019
3 13Mar2020
3 07May2020
;
data a;
input ID Date :date9.;
format Date date9.;
datalines;
1 01Feb2019
1 02Feb2019
1 30Mar2019
2 30Mar2017
2 01Jun2018
2 30Jun2018
3 20Jan2019
3 21Jan2019
3 25Mar2020
;
data want (keep = ID Date);
do until (last.ID);
set a;
by ID;
if first.ID then from = Date;
if last.ID then to = Date;
end;
do until (last.ID);
set have;
by ID;
if from <= Date <= to then output;
end;
run;
Result:
ID Date 1 03FEB2019 1 15MAR2019 2 01APR2017 2 15MAY2018 2 12JUN2018 3 12FEB2019 3 13MAR2020
Or in case you want the indicator variable.
Btw, I assumed the form of your 'a' data set.
data want (keep = ID Date indicator);
do until (last.ID);
set a;
by ID;
if first.ID then from = Date;
if last.ID then to = Date;
end;
do until (last.ID);
set have;
by ID;
indicator = (from <= Date <= to);
output;
end;
run;
Result:
ID Date indicator 1 01JAN2019 0 1 03FEB2019 1 1 15MAR2019 1 1 22APR2019 0 2 20MAR2017 0 2 01APR2017 1 2 15MAY2018 1 2 12JUN2018 1 2 23JUL2019 0 3 15JAN2019 0 3 12FEB2019 1 3 13MAR2020 1 3 07MAY2020 0
Hi everyone,
I've attached two excel files. Each contains a list of repeated IDs and associated dates. Dataset A is the set that contains the relevant date ranges for each ID. In other words, the first and last dates for each ID would be the date ranges for each ID. Dataset B has a different set of dates for each ID. Earlier, I had said I wanted to merge, but I think that was a mistake. What I'd like is to limit or identify observations for each ID from Dataset B based on the date ranges from Dataset A. Perhaps draycut's response works for this. Let me know if you need anything else.
Thanks
Many people, including myself, will not download attachments.
Furthermore, the format of the data sets should be that shown by @PeterClemmensen , documentation here
That's the best format to provide data. Help us out by providing data in this format.
My datasets were too large to copy and paste so hopefully this smaller set works:
data A;
input ID Date :date9.;
format Date date9.;
datalines;
ID | Date |
204356 | 02/13/2019 |
204356 | 03/17/2019 |
204356 | 04/20/2019 |
204356 | 07/20/2019 |
204356 | 09/22/2019 |
204356 | 10/19/2019 |
205352 | 10/01/2017 |
205756 | 01/17/2018 |
205756 | 02/07/2018 |
205756 | 04/11/2018 |
205756 | 05/07/2018 |
205756 | 04/27/2018 |
205756 | 05/14/2018 |
205756 | 07/06/2018 |
205756 | 10/15/2018 |
205756 | 10/05/2018 |
205756 | 10/29/2018 |
205756 | 01/18/2019 |
205756 | 02/25/2019 |
205756 | 05/15/2019 |
205756 | 08/18/2019 |
205756 | 09/04/2019 |
205756 | 11/16/2019 |
data B;
input ID Date :date9.;
format Date date9.;
datalines;
ID | Date |
204356 | 02/08/2019 |
204356 | 03/22/2019 |
204356 | 04/25/2019 |
204356 | 07/25/2019 |
204356 | 09/27/2019 |
204356 | 10/24/2019 |
205352 | 10/06/2017 |
205756 | 01/22/2018 |
205756 | 02/12/2018 |
205756 | 04/16/2018 |
205756 | 05/02/2018 |
205756 | 05/02/2018 |
205756 | 05/09/2018 |
205756 | 07/11/2018 |
205756 | 10/10/2018 |
205756 | 10/10/2018 |
205756 | 10/24/2018 |
205756 | 01/23/2019 |
205756 | 02/20/2019 |
205756 | 05/20/2019 |
205756 | 08/13/2019 |
205756 | 09/09/2019 |
205756 | 11/11/2019 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.