Contributor
Posts: 44

Date Calculations

Hello Everyone,

I have a sample dataset shown below. Each ID has different dates. I want to choose the observations within 15day period from the oldest date for each ID. For the observations with only one date per ID should appear in the output. For example in the below dataset I have the dates sorted by ID and date.

 Date ID 1/2/2013 1 1/3/2013 1 1/25/2013 1 4/25/2014 2 2/6/2014 3 2/8/2014 3 3/8/2014 3 5/7/2013 4 4/4/2013 5 4/4/2013 5 6/5/2013 5 5/9/2014 6 8/16/2014 6

I need the output as shown below.

 1/2/2013 1 1/3/2013 1 4/25/2014 2 2/6/2014 3 2/8/2014 3 5/7/2013 4 4/4/2013 5 4/4/2013 5

In the above output dataset, For the first ID, the third observation is eliminated as the date difference between the oldest date and the third observation date was  above 15 days (i.e 23 day gap). For the sixth ID, the date difference was above 15days so both the observations got deleted. Second and Fourth ID were not changed as it had only one date per ID.

Super User
Posts: 13,583

Re: Date Calculations

First question: Are your dates actual SAS date values or character strings?

Contributor
Posts: 44

Re: Date Calculations

Thanks for quick reply ballardw. The dates are actual SAS date values.
Super User
Posts: 13,583

Re: Date Calculations

[ Edited ]

Neither 1/2/2013 nor 1/3/2013 are within 15 days of 1/25/2013. So is your rule actually within 15 days or something else?

Note that 2/6/2014 and 2/8/2014 are also not within 15 of 3/8/2014 and 4/4/2013 is not within 15 of 6/5/2013.

Also for Id 1 and 3 you do not include the maximum valued date but for Id 2 you do. Is the rule to do that if there is only a single value?

Contributor
Posts: 44

Re: Date Calculations

Hello Ballardw,

Thanks for responding. The dates are in DDMMYYYY format. So for the first ID, its Jan 2,2013 and Jan 3,2013.

My intention was to include all the IDs with single date values. So, I included ID 2 and ID 4 in the output.
Super User
Posts: 13,583

Re: Date Calculations

Not all of your dates are dd/mm/yyyy

Look at:

1/25/2013 1

4/25/2014 2

25 is not a valid month.

So the first step is going to make sure your data is what you think it is.

This may give a starting point. The two SQL steps could be combined but make sure they are working as intended. The first step gets the latest date for each ID. Then match to the dates within the 15 day window. Exercise for the interested reader to find the Ids with only one record and append to the result.

``````proc sql;
create table work.start as
select id, date
from have
group by id
having date=Max(date);
quit;

proc sql;
create table work.most as
select b.id, b.date
from work.start as a left join have as b
on a.id = b.id and abs(a.date-b.date) lt 15;
quit;

``````
Contributor
Posts: 44

Re: Date Calculations

Sorry for the confusion Ballardw. That was my mistake. The date is actually in dd/mm/yyyy format.The first step should give the oldest date (first date) for each ID. I used min instead of max. The next sept should find the dates which are within the 15days from the oldest date. Unfortunately, the second code you suggested does not give me the actual results I wanted.

The output should satisty two conditions:

1. If there is only one date per each ID then keep in the output. (As in second and fourth ID)

2. If there are two or more dates per each ID the difference between the oldest date (i.e first date) and the other dates should not exceed 15 days.

For clear understanding, the same dataset is formatted as below.

 Date ID Date diff Condition 1/2/13 1 0 satisfied 1/3/13 1 1 satified 1/25/13 1 23 not satified 4/25/14 2 0 satisfied 2/6/14 3 0 satified 2/8/14 3 2 satified 3/8/14 3 2 satified 5/7/13 4 0 satified 4/4/13 5 0 satified 4/4/13 5 0 satified 6/5/13 5 31 not satified 5/9/14 6 0 satified 8/16/14 6 97 not satified

So, the dataset should output only the conditions which are satified. The output should look like below.

 Date ID 1/2/13 1 1/3/13 1 4/25/14 2 2/6/14 3 2/8/14 3 3/8/14 3 5/7/13 4 4/4/13 5 4/4/13 5 5/9/14 6

Thank you

Super User
Posts: 10,787

Re: Date Calculations

``````data have;
infile cards expandtabs truncover;
input Date : mmddyy10. ID;
format Date mmddyy10.;
cards;
1/2/2013	1
1/3/2013	1
1/25/2013	1
4/25/2014	2
2/6/2014	3
2/8/2014	3
3/8/2014	3
5/7/2013	4
4/4/2013	5
4/4/2013	5
6/5/2013	5
5/9/2014	6
8/16/2014	6
;
run;
data temp;
set have;
by id;
retain temp;
if first.id then do; temp=date;group=0;end;
dif=date-temp;
if dif gt 15 then group+1;
drop temp dif;
run;
data want;
set temp;
by id group;
if (first.id and last.id) or (not (first.group and last.group));
run;``````
Contributor
Posts: 44

Re: Date Calculations

Hello Keshan,

Thanks for the solution. The code is working for the dataset I provided. But when tried with a bigger dataset this is not working. I think if we create a separate variable which shows the difference between the dates of first.id and last.id would work.

The rule is
Difference between the first observation of first ID and the other observations of the first ID should not exceed 15 days.

Super User
Posts: 10,787

Re: Date Calculations

Yes. I have already calculated that difference - DIF .

``drop temp dif;  ==>``
``drop temp;``

Post the data that did not work for you and the output you want to see.

Discussion stats
• 9 replies
• 377 views
• 0 likes
• 3 in conversation