Hi everyone
Having a mental block issue
I have a dataset that looks like this:
data Tabkle;
input nr subject date anydtdte23.;
format date date9.;
cards;
1 1 01Jan2013
2 1 31Jan2013
3 1 02Mar2013
4 1 01Apr2013
5 1 01May2013
6 2 31May2013
7 2 30Jun2013
8 2 30Jul2013
9 2 29Aug2013
10 2 28Sep2013
11 2 28Oct2013
12 3 27Nov2013
13 3 27Dec2013
14 3 26Jan2014
15 3 25Feb2014
16 3 27Mar2014
17 4 26Apr2014
18 4 26May2014
19 4 25Jun2014
20 4 25Jul2014
21 4 24Aug2014
;
run;
To make my question easier to understand i made the difference between the dates in the rows = 30 days
What I want to do is to select observations (starting with the first from each group) where the date difference is > 40 days from the previous one.
So the code will fist take the first observation
The second observation will be deleted because the date (obs2)-date(obs1) < 40
Then the third observation will be selected because date (obs3) - date (obs1) > 40 : here obs2 is deleted by previous step so comparison occurs with obs1
then obs 4 will be deleted because date(obs4)-date(obs3) <40
then obs 5 selected because ate (obs5) - date (obs3) > 40 : here obs4 is deleted by previous step so comparison occurs with obs3
This is for group 1
then it repeats for each group and so forth
I have tried the following, it didnt work because it still compares with the observation that should have been seleted:
data table2;
set table;
by subject;
if lag(date) ne . and date - lag(date) <40 then delete;
run;
Neither did the following work
proc sql;
create table table2 as
select distinct
L.*,
R.date as next_date format=ddmmyy10.
from table as L inner join table as R
on
L.nr=R.nr
and intck('day', L.date, R.date) > 40
order by L.nr, L.date
;
quit;
Any help in solving this is appreciated
Use a retained variable:
data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;
data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;
Here is a quick bash at it, I have to go in a meeting now. Very hardcoded at the mo:
data have; input nr subject date yymmdd8.; format date date9.; datalines; 1 1 20130101 2 1 20130131 3 1 20130302 4 1 20130401 5 1 20130501 6 2 20130531 7 2 20130630 8 2 20130730 9 2 20130829 10 2 20130928 11 2 20131028 ; run; proc transpose data=have out=inter; by subject; var date; run; data inter; set inter; array col{6}; if col{2}-col{1} < 40 then col{2}=.; if col{3}-coalesce(col{2},col{1}) < 40 then col{3}=.; if col{4}-coalesce(col{3},col{2},col{1}) < 40 then col{4}=.; if col{5}-coalesce(col{4},col{3},col{2},col{1}) < 40 then col{5}=.; if col{6}-coalesce(col{5},col{4},col{3},col{2},col{1}) < 40 then col{6}=.; run; proc transpose data=inter out=list; by subject; var col:; run; data list; set list; where date ne .; run; proc sql; create table want as select a.* from have a right join list b on a.subject=b.subject and a.date=b.date; quit;
Basically transpose dates, do the check on array, then use that to keep only those values in the original data.
Am a bit confused by your logic though, sometimes its > 40, sometimes less. Perhaps give a clearer example, including the why.
Thanks RW9
the problem is that it is not always 6 rows per subject, sometimes it is 50...
Thanks again
@ammarhm wrote:
Thanks RW9
the problem is that it is not always 6 rows per subject, sometimes it is 50...
Thanks again
Naturally, we can only write code that works on the example you gave us ...
I haven't tested this, but something like this logic ought to work
data want;
retain startdate;
set have;
by group;
if first.group then startdate=date;
if date-startdate<40 and not first.group then delete;
else startdate=date;
run;
Use a retained variable:
data have;
input nr subject date :yymmdd10.;
format date yymmddd10.;
datalines;
1 1 20130101
2 1 20130131
3 1 20130302
4 1 20130401
5 1 20130501
6 2 20130531
7 2 20130630
8 2 20130730
9 2 20130829
10 2 20130928
11 2 20131028
12 3 20131127
13 3 20131227
14 3 20140126
15 3 20140225
16 3 20140327
17 4 20140426
18 4 20140526
19 4 20140625
20 4 20140725
21 4 20140824
;
run;
data want;
set have;
by subject;
retain comp_date;
if first.subject then comp_date = date;
else do;
if date - comp_date > 40
then comp_date = date;
else delete;
end;
drop comp_date;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.