Hi everyone
Having a mental block issue
I have a dataset that looks like this:
data Tabkle;
input nr subject date anydtdte23.;
format date date9.;
cards;
1 1 01Jan2013
2 1 31Jan2013
3 1 02Mar2013
4 1 01Apr2013
5 1 01May2013
6 2 31May2013
7 2 30Jun2013
8 2 30Jul2013
9 2 29Aug2013
10 2 28Sep2013
11 2 28Oct2013
12 3 27Nov2013
13 3 27Dec2013
14 3 26Jan2014
15 3 25Feb2014
16 3 27Mar2014
17 4 26Apr2014
18 4 26May2014
19 4 25Jun2014
20 4 25Jul2014
21 4 24Aug2014
;
run;
To make my question easier to understand i made the difference between the dates in the rows = 30 days
What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.
So the code will fist take the first observation
The second observation will be deleted because the date (obs2)-date(obs1) < 40
Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1
then obs 4 will be deleted because date(obs4)-date(obs3) <40
then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3
This is for group 1
then it repeats for each group and so forth
I have tried the following, it didnt work because it still compares with the observation that should have been seleted:
data table2;
set table;
by subject;
if lag(date) ne . and date - lag(date) <40 then delete;
run;
Neither did the following work
proc sql;
create table table2 as
select distinct
L.*,
R.date as next_date format=ddmmyy10.
from table as L inner join table as R
on
L.nr=R.nr
and intck('day', L.date, R.date) > 40
order by L.nr, L.date
;
quit;
Any help in solving this is appreciated
Use a retained variable:
data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;
data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;
Here is a quick bash at it, I have to go in a meeting now. Very hardcoded at the mo:
data have; input nr subject date yymmdd8.; format date date9.; datalines; 1 1 20130101 2 1 20130131 3 1 20130302 4 1 20130401 5 1 20130501 6 2 20130531 7 2 20130630 8 2 20130730 9 2 20130829 10 2 20130928 11 2 20131028 ; run; proc transpose data=have out=inter; by subject; var date; run; data inter; set inter; array col{6}; if col{2}-col{1} < 40 then col{2}=.; if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.; if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.; if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.; if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.; run; proc transpose data=inter out=list; by subject; var col:; run; data list; set list; where date ne .; run; proc sql; create table want as select a.* from have a right join list b on a.subject=b.subject and a.date=b.date; quit;
Basically transpose dates, do the check on array, then use that to keep only those values in the original data.
Am a bit confused by your logic though, sometimes its > 40, sometimes less. Perhaps give a clearer example, including the why.
Thanks RW9
the problem is that it is not always 6 rows per subject, sometimes it is 50...
Thanks again
@ammarhm wrote:
Thanks RW9
the problem is that it is not always 6 rows per subject, sometimes it is 50...
Thanks again
Naturally, we can only write code that works on the example you gave us ...
I haven't tested this, but something like this logic ought to work
data want;
retain startdate;
set have;
by group;
if first.group then startdate=date;
if date-startdate<40 and not first.group then delete;
else startdate=date;
run;
Use a retained variable:
data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;
data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.