BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

Hi everyone

Having a mental block issue 

I have a dataset that looks like this:

 

data Tabkle;
input nr subject date anydtdte23.;
format date date9.;

cards;
1	1	01Jan2013
2	1	31Jan2013
3	1	02Mar2013
4	1	01Apr2013
5	1	01May2013
6	2	31May2013
7	2	30Jun2013
8	2	30Jul2013
9	2	29Aug2013
10	2	28Sep2013
11	2	28Oct2013
12	3	27Nov2013
13	3	27Dec2013
14	3	26Jan2014
15	3	25Feb2014
16	3	27Mar2014
17	4	26Apr2014
18	4	26May2014
19	4	25Jun2014
20	4	25Jul2014
21	4	24Aug2014

;
run;



To make my question easier to understand i made the difference between the dates in the rows = 30 days

What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.

 

So the code will fist take the first observation

The second observation will be deleted because the date (obs2)-date(obs1) < 40

Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1

then obs 4 will be deleted because date(obs4)-date(obs3) <40

then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3

This is for group 1

then it repeats for each group and so forth

I have tried the following, it didnt work because it still compares with the observation that should have been seleted:

 

data table2;
set table;
by subject;

if lag(date) ne . and  date - lag(date) <40 then delete;
run;

 

Neither did the following work



proc sql;
create table table2 as

select distinct 
	L.*,
	R.date as next_date format=ddmmyy10.

from table as L inner join table as R
on
	L.nr=R.nr
and  intck('day', L.date, R.date) > 40
order by L.nr, L.date
	;
	quit;

Any help in solving this is appreciated

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Here is a quick bash at it, I have to go in a meeting now.  Very hardcoded at the mo:

data have;
  input nr subject date yymmdd8.;
  format date date9.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
;
run;

proc transpose data=have out=inter;
  by subject;
  var date;
run;
data inter;
  set inter;
  array col{6};
  if col{2}-col{1} < 40 then col{2}=.;
  if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.;
  if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.;
  if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.;
  if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.;
run;

proc transpose data=inter out=list;
  by subject;
  var col:;
run;
data list;
  set list;
  where date ne .;
run;

proc sql;
  create table want as
  select a.*
  from   have a
  right join list b
  on     a.subject=b.subject
  and    a.date=b.date;
quit;

Basically transpose dates, do the check on array, then use that to keep only those values in the original data.

Am a bit confused by your logic though, sometimes its > 40, sometimes less.  Perhaps give a clearer example, including the why.  

ammarhm
Lapis Lazuli | Level 10

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again

PaigeMiller
Diamond | Level 26

@ammarhm wrote:

Thanks RW9

the problem is that it is not always 6 rows per subject, sometimes it is 50...

Thanks again


Naturally, we can only write code that works on the example you gave us ... 

--
Paige Miller
PaigeMiller
Diamond | Level 26

I haven't tested this, but something like this logic ought to work

 

data want;
    retain startdate;
    set have;
    by group;
    if first.group then startdate=date;
    if date-startdate<40 and not first.group then delete;
    else startdate=date;
run;
     
--
Paige Miller
Kurt_Bremser
Super User

Use a retained variable:

data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;

data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
  if date - comp_date > 40
  then comp_date = date;
  else delete;
end;
drop comp_date;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1463 views
  • 0 likes
  • 4 in conversation