DATA Step, Macro, Functions and more

selecting consequent observations by date difference

Accepted Solution Solved
Reply
Regular Contributor
Posts: 182
Accepted Solution

selecting consequent observations by date difference

[ Edited ]

Hi everyone

Having a mental block issue 

I have a dataset that looks like this:

 

data Tabkle;
input nr subject date anydtdte23.;
format date date9.;

cards;
1	1	01Jan2013
2	1	31Jan2013
3	1	02Mar2013
4	1	01Apr2013
5	1	01May2013
6	2	31May2013
7	2	30Jun2013
8	2	30Jul2013
9	2	29Aug2013
10	2	28Sep2013
11	2	28Oct2013
12	3	27Nov2013
13	3	27Dec2013
14	3	26Jan2014
15	3	25Feb2014
16	3	27Mar2014
17	4	26Apr2014
18	4	26May2014
19	4	25Jun2014
20	4	25Jul2014
21	4	24Aug2014

;
run;



To make my question easier to understand i made the difference between the dates in the rows = 30 days

What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.

 

So the code will fist take the first observation

The second observation will be deleted because the date (obs2)-date(obs1) < 40

Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1

then obs 4 will be deleted because date(obs4)-date(obs3) <40

then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3

This is for group 1

then it repeats for each group and so forth

I have tried the following, it didnt work because it still compares with the observation that should have been seleted:

 

data table2;
set table;
by subject;

if lag(date) ne . and  date - lag(date) <40 then delete;
run;

 

Neither did the following work



proc sql;
create table table2 as

select distinct 
	L.*,
	R.date as next_date format=ddmmyy10.

from table as L inner join table as R
on
	L.nr=R.nr
and  intck('day', L.date, R.date) > 40
order by L.nr, L.date
	;
	quit;

Any help in solving this is appreciated


Accepted Solutions
Solution
‎06-19-2018 09:41 AM
Super User
Posts: 10,278

Re: selecting consequent observations by date difference

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
Super User
Super User
Posts: 9,599

Re: selecting consequent observations by date difference

Here is a quick bash at it, I have to go in a meeting now.  Very hardcoded at the mo:

data have;
  input nr subject date yymmdd8.;
  format date date9.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
;
run;

proc transpose data=have out=inter;
  by subject;
  var date;
run;
data inter;
  set inter;
  array col{6};
  if col{2}-col{1} < 40 then col{2}=.;
  if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.;
  if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.;
  if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.;
  if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.;
run;

proc transpose data=inter out=list;
  by subject;
  var col:;
run;
data list;
  set list;
  where date ne .;
run;

proc sql;
  create table want as
  select a.*
  from   have a
  right join list b
  on     a.subject=b.subject
  and    a.date=b.date;
quit;

Basically transpose dates, do the check on array, then use that to keep only those values in the original data.

Am a bit confused by your logic though, sometimes its > 40, sometimes less.  Perhaps give a clearer example, including the why.  

Regular Contributor
Posts: 182

Re: selecting consequent observations by date difference

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again

Respected Advisor
Posts: 3,054

Re: selecting consequent observations by date difference


@ammarhm wrote:

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again


Naturally, we can only write code that works on the example you gave us ... 

--
Paige Miller
Respected Advisor
Posts: 3,054

Re: selecting consequent observations by date difference

[ Edited ]

I haven't tested this, but something like this logic ought to work

 

data want;
    retain startdate;
    set have;
    by group;
    if first.group then startdate=date;
    if date-startdate<40 and not first.group then delete;
    else startdate=date;
run;
     
--
Paige Miller
Solution
‎06-19-2018 09:41 AM
Super User
Posts: 10,278

Re: selecting consequent observations by date difference

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 89 views
  • 0 likes
  • 4 in conversation