Hello Everyone,
I have a sample dataset shown below. Each ID has different dates. I want to choose the observations within 15day period from the oldest date for each ID. For the observations with only one date per ID should appear in the output. For example in the below dataset I have the dates sorted by ID and date.
Date | ID |
1/2/2013 | 1 |
1/3/2013 | 1 |
1/25/2013 | 1 |
4/25/2014 | 2 |
2/6/2014 | 3 |
2/8/2014 | 3 |
3/8/2014 | 3 |
5/7/2013 | 4 |
4/4/2013 | 5 |
4/4/2013 | 5 |
6/5/2013 | 5 |
5/9/2014 | 6 |
8/16/2014 | 6 |
I need the output as shown below.
1/2/2013 | 1 |
1/3/2013 | 1 |
4/25/2014 | 2 |
2/6/2014 | 3 |
2/8/2014 | 3 |
5/7/2013 | 4 |
4/4/2013 | 5 |
4/4/2013 | 5 |
In the above output dataset, For the first ID, the third observation is eliminated as the date difference between the oldest date and the third observation date was above 15 days (i.e 23 day gap). For the sixth ID, the date difference was above 15days so both the observations got deleted. Second and Fourth ID were not changed as it had only one date per ID.
Please guide me in coding this. Thank you in advance.
First question: Are your dates actual SAS date values or character strings?
Neither 1/2/2013 nor 1/3/2013 are within 15 days of 1/25/2013. So is your rule actually within 15 days or something else?
Note that 2/6/2014 and 2/8/2014 are also not within 15 of 3/8/2014 and 4/4/2013 is not within 15 of 6/5/2013.
Also for Id 1 and 3 you do not include the maximum valued date but for Id 2 you do. Is the rule to do that if there is only a single value?
Not all of your dates are dd/mm/yyyy
Look at:
1/25/2013 1
4/25/2014 2
25 is not a valid month.
So the first step is going to make sure your data is what you think it is.
This may give a starting point. The two SQL steps could be combined but make sure they are working as intended. The first step gets the latest date for each ID. Then match to the dates within the 15 day window. Exercise for the interested reader to find the Ids with only one record and append to the result.
proc sql;
create table work.start as
select id, date
from have
group by id
having date=Max(date);
quit;
proc sql;
create table work.most as
select b.id, b.date
from work.start as a left join have as b
on a.id = b.id and abs(a.date-b.date) lt 15;
quit;
Sorry for the confusion Ballardw. That was my mistake. The date is actually in dd/mm/yyyy format.The first step should give the oldest date (first date) for each ID. I used min instead of max. The next sept should find the dates which are within the 15days from the oldest date. Unfortunately, the second code you suggested does not give me the actual results I wanted.
The output should satisty two conditions:
1. If there is only one date per each ID then keep in the output. (As in second and fourth ID)
2. If there are two or more dates per each ID the difference between the oldest date (i.e first date) and the other dates should not exceed 15 days.
For clear understanding, the same dataset is formatted as below.
Date | ID | Date diff | Condition |
1/2/13 | 1 | 0 | satisfied |
1/3/13 | 1 | 1 | satified |
1/25/13 | 1 | 23 | not satified |
4/25/14 | 2 | 0 | satisfied |
2/6/14 | 3 | 0 | satified |
2/8/14 | 3 | 2 | satified |
3/8/14 | 3 | 2 | satified |
5/7/13 | 4 | 0 | satified |
4/4/13 | 5 | 0 | satified |
4/4/13 | 5 | 0 | satified |
6/5/13 | 5 | 31 | not satified |
5/9/14 | 6 | 0 | satified |
8/16/14 | 6 | 97 | not satified |
So, the dataset should output only the conditions which are satified. The output should look like below.
Date | ID |
1/2/13 | 1 |
1/3/13 | 1 |
4/25/14 | 2 |
2/6/14 | 3 |
2/8/14 | 3 |
3/8/14 | 3 |
5/7/13 | 4 |
4/4/13 | 5 |
4/4/13 | 5 |
5/9/14 | 6 |
Thank you
data have;
infile cards expandtabs truncover;
input Date : mmddyy10. ID;
format Date mmddyy10.;
cards;
1/2/2013 1
1/3/2013 1
1/25/2013 1
4/25/2014 2
2/6/2014 3
2/8/2014 3
3/8/2014 3
5/7/2013 4
4/4/2013 5
4/4/2013 5
6/5/2013 5
5/9/2014 6
8/16/2014 6
;
run;
data temp;
set have;
by id;
retain temp;
if first.id then do; temp=date;group=0;end;
dif=date-temp;
if dif gt 15 then group+1;
drop temp dif;
run;
data want;
set temp;
by id group;
if (first.id and last.id) or (not (first.group and last.group));
run;
Hello Keshan,
Thanks for the solution. The code is working for the dataset I provided. But when tried with a bigger dataset this is not working. I think if we create a separate variable which shows the difference between the dates of first.id and last.id would work.
The rule is
Difference between the first observation of first ID and the other observations of the first ID should not exceed 15 days.
Yes. I have already calculated that difference - DIF .
drop temp dif; ==>
drop temp;
Post the data that did not work for you and the output you want to see.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.