I have data like this:
ID relapse_Date
1 8/28/2012
1 8/30/2012
1 11/5/2012
1 1/2/2013
1 2/13/2013
1 3/18/2013
1 4/15/2013
2 5/1/2008
2 5/20/2008
2 6/14/2008
I would like to count the unique number of dates that each ID has a replase, but only count relapses that are 30 days or more apart from each other. So even though ID 1 has 7 unique relapse dates, I only want to count 5 of them (8/28/2012, 11/5/2012, 1/2/2013, 2/13/2013, 3/18/2013; the relapses on 8/30/2012 and 4/15/2013 are within 30 days of other relapse dates). I've been trying to use lag, retain, and multiple set statements but can't seem to make any of those solve my problem. Thanks in advance for any help.
I deleted my first reply because I hadn't read the problem correctly. But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:
data want;
set have;
by id;
difdate=ifn(first.id,.,dif(relapse_date));
if first.id or relapse_date>cutoff then do;
counter=ifn(first.id,1,counter+1);
cutoff=relapse_date+30;
end;
retain cutoff counter;
format cutoff date9.;
run;
But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays? Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days? That's what I understand your request to mean.
You don't specify what you want as output, so here's one way that you can then modify to your needs.
data want;
set have;
by id relapse_date;
dif_date = dif(relapse_date);
retain counter;
if first.id then do;
dif_date = .;
counter=1;
end;
else if dif_date >= 30 then counter+1;
run;
Not sure what you want in your output, you should provide an output sample too. Try and modify-
data have;
input ID relapse_Date :mmddyy10.;
format relapse_Date mmddyy10.;
datalines;
1 8/28/2012
1 8/30/2012
1 11/5/2012
1 1/2/2013
1 2/13/2013
1 3/18/2013
1 4/15/2013
2 5/1/2008
2 5/20/2008
2 6/14/2008
;
data want;
merge have have(firstobs=2 rename=(relapse_Date=_relapse_Date));
by id;
if first.id then count=1;
if intck('days',relapse_Date,_relapse_Date)>=30 then count+1;
drop _:;
run;
Thank you! This is close but here is an example of a tricky situation. After I run Reeza's code, I have this:
ID |
Relapse_date |
dif_date |
counter |
33 |
5/8/2007 |
. |
1 |
33 |
6/5/2007 |
28 |
1 |
33 |
7/7/2007 |
32 |
2 |
33 |
8/7/2007 |
31 |
3 |
33 |
9/11/2007 |
35 |
4 |
33 |
10/10/2007 |
29 |
4 |
33 |
11/8/2007 |
29 |
4 |
33 |
12/4/2007 |
26 |
4 |
33 |
12/27/2007 |
23 |
4 |
And what I would like to have is this:
ID |
Relapse_date |
dif_date |
counter |
33 |
5/8/2007 |
. |
1 |
33 |
6/5/2007 |
28 |
1 |
33 |
7/7/2007 |
32 |
2 |
33 |
8/7/2007 |
31 |
3 |
33 |
9/11/2007 |
35 |
4 |
33 |
10/10/2007 |
29 |
4 |
33 |
11/8/2007 |
29 |
5 |
33 |
12/4/2007 |
26 |
5 |
33 |
12/27/2007 |
23 |
6 |
I would like the program to ignore the relapse at 10/10/2007, because it's less than 30 days after the relapse on 9/11/2007, but then take into account that the relapse on 11/8/2007 IS more than 30 days after the 9/11/2007 relapse and count it. Same issue with the relapse on 12/27/2007- it's less than 30 days after the one on 12/4/2007 but more than 30 days after the one on 11/8/2007. Thanks so much in advance!
I deleted my first reply because I hadn't read the problem correctly. But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:
data want;
set have;
by id;
difdate=ifn(first.id,.,dif(relapse_date));
if first.id or relapse_date>cutoff then do;
counter=ifn(first.id,1,counter+1);
cutoff=relapse_date+30;
end;
retain cutoff counter;
format cutoff date9.;
run;
But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays? Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days? That's what I understand your request to mean.
Thank you so much, that worked! To answer your question, yes, I would only want the 5th Wednesday (that was 35 days since the original relapse) to trigger the counter, which is what your code does. I can't thank you enough! 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.