## selecting consequent observations by date difference

Solved
Regular Contributor
Posts: 182

# selecting consequent observations by date difference

[ Edited ]

Hi everyone

Having a mental block issue

I have a dataset that looks like this:

``````data Tabkle;
input nr subject date anydtdte23.;
format date date9.;

cards;
1	1	01Jan2013
2	1	31Jan2013
3	1	02Mar2013
4	1	01Apr2013
5	1	01May2013
6	2	31May2013
7	2	30Jun2013
8	2	30Jul2013
9	2	29Aug2013
10	2	28Sep2013
11	2	28Oct2013
12	3	27Nov2013
13	3	27Dec2013
14	3	26Jan2014
15	3	25Feb2014
16	3	27Mar2014
17	4	26Apr2014
18	4	26May2014
19	4	25Jun2014
20	4	25Jul2014
21	4	24Aug2014

;
run;``````

To make my question easier to understand i made the difference between the dates in the rows = 30 days

What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.

So the code will fist take the first observation

The second observation will be deleted because the date (obs2)-date(obs1) < 40

Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1

then obs 4 will be deleted because date(obs4)-date(obs3) <40

then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3

This is for group 1

then it repeats for each group and so forth

I have tried the following, it didnt work because it still compares with the observation that should have been seleted:

``````data table2;
set table;
by subject;

if lag(date) ne . and  date - lag(date) <40 then delete;
run;
``````

Neither did the following work

``````

proc sql;
create table table2 as

select distinct
L.*,
R.date as next_date format=ddmmyy10.

from table as L inner join table as R
on
L.nr=R.nr
and  intck('day', L.date, R.date) > 40
order by L.nr, L.date
;
quit;
``````

Any help in solving this is appreciated

Accepted Solutions
Solution
‎06-19-2018 09:41 AM
Super User
Posts: 10,278

## Re: selecting consequent observations by date difference

Use a retained variable:

``````data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

All Replies
Super User
Posts: 9,599

## Re: selecting consequent observations by date difference

Here is a quick bash at it, I have to go in a meeting now.  Very hardcoded at the mo:

```data have;
input nr subject date yymmdd8.;
format date date9.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
;
run;

proc transpose data=have out=inter;
by subject;
var date;
run;
data inter;
set inter;
array col{6};
if col{2}-col{1} < 40 then col{2}=.;
if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.;
if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.;
if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.;
if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.;
run;

proc transpose data=inter out=list;
by subject;
var col:;
run;
data list;
set list;
where date ne .;
run;

proc sql;
create table want as
select a.*
from   have a
right join list b
on     a.subject=b.subject
and    a.date=b.date;
quit;```

Basically transpose dates, do the check on array, then use that to keep only those values in the original data.

Am a bit confused by your logic though, sometimes its > 40, sometimes less.  Perhaps give a clearer example, including the why.

Regular Contributor
Posts: 182

## Re: selecting consequent observations by date difference

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again

Posts: 3,054

## Re: selecting consequent observations by date difference

@ammarhm wrote:

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again

Naturally, we can only write code that works on the example you gave us ...

--
Paige Miller
Posts: 3,054

## Re: selecting consequent observations by date difference

[ Edited ]

I haven't tested this, but something like this logic ought to work

``````data want;
retain startdate;
set have;
by group;
if first.group then startdate=date;
if date-startdate<40 and not first.group then delete;
else startdate=date;
run;
``` ```
--
Paige Miller
Solution
‎06-19-2018 09:41 AM
Super User
Posts: 10,278

## Re: selecting consequent observations by date difference

Use a retained variable:

``````data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
☑ This topic is solved.