BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

Hi everyone

Having a mental block issue 

I have a dataset that looks like this:

 

data Tabkle;
input nr subject date anydtdte23.;
format date date9.;

cards;
1	1	01Jan2013
2	1	31Jan2013
3	1	02Mar2013
4	1	01Apr2013
5	1	01May2013
6	2	31May2013
7	2	30Jun2013
8	2	30Jul2013
9	2	29Aug2013
10	2	28Sep2013
11	2	28Oct2013
12	3	27Nov2013
13	3	27Dec2013
14	3	26Jan2014
15	3	25Feb2014
16	3	27Mar2014
17	4	26Apr2014
18	4	26May2014
19	4	25Jun2014
20	4	25Jul2014
21	4	24Aug2014

;
run;



To make my question easier to understand i made the difference between the dates in the rows = 30 days

What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.

 

So the code will fist take the first observation

The second observation will be deleted because the date (obs2)-date(obs1) < 40

Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1

then obs 4 will be deleted because date(obs4)-date(obs3) <40

then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3

This is for group 1

then it repeats for each group and so forth

I have tried the following, it didnt work because it still compares with the observation that should have been seleted:

 

data table2;
set table;
by subject;

if lag(date) ne . and  date - lag(date) <40 then delete;
run;

 

Neither did the following work



proc sql;
create table table2 as

select distinct 
	L.*,
	R.date as next_date format=ddmmyy10.

from table as L inner join table as R
on
	L.nr=R.nr
and  intck('day', L.date, R.date) > 40
order by L.nr, L.date
	;
	quit;

Any help in solving this is appreciated

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Here is a quick bash at it, I have to go in a meeting now.  Very hardcoded at the mo:

data have;
  input nr subject date yymmdd8.;
  format date date9.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
;
run;

proc transpose data=have out=inter;
  by subject;
  var date;
run;
data inter;
  set inter;
  array col{6};
  if col{2}-col{1} < 40 then col{2}=.;
  if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.;
  if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.;
  if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.;
  if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.;
run;

proc transpose data=inter out=list;
  by subject;
  var col:;
run;
data list;
  set list;
  where date ne .;
run;

proc sql;
  create table want as
  select a.*
  from   have a
  right join list b
  on     a.subject=b.subject
  and    a.date=b.date;
quit;

Basically transpose dates, do the check on array, then use that to keep only those values in the original data.

Am a bit confused by your logic though, sometimes its > 40, sometimes less.  Perhaps give a clearer example, including the why.  

ammarhm
Lapis Lazuli | Level 10

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again

PaigeMiller
Diamond | Level 26

@ammarhm wrote:

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again


Naturally, we can only write code that works on the example you gave us ... 

--
Paige Miller
PaigeMiller
Diamond | Level 26

I haven't tested this, but something like this logic ought to work

 

data want;
    retain startdate;
    set have;
    by group;
    if first.group then startdate=date;
    if date-startdate<40 and not first.group then delete;
    else startdate=date;
run;
     
--
Paige Miller
Kurt_Bremser
Super User

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2239 views
  • 0 likes
  • 4 in conversation