BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lefty
Obsidian | Level 7

I have data like this:

 

 ID        relapse_Date

1     8/28/2012

1     8/30/2012

1      11/5/2012

1        1/2/2013

1        2/13/2013

1        3/18/2013

1        4/15/2013

2        5/1/2008

2       5/20/2008

2        6/14/2008

 

I would like to count the unique number of dates that each ID has a replase, but only count relapses that are 30 days or more apart from each other. So even though ID 1 has 7 unique relapse dates, I only want to count 5 of them (8/28/2012, 11/5/2012, 1/2/2013, 2/13/2013, 3/18/2013; the relapses on 8/30/2012 and 4/15/2013 are within 30 days of other relapse dates). I've been trying to use lag, retain, and multiple set statements but can't seem to make any of those solve my problem. Thanks in advance for any help.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I deleted my first reply because I hadn't read the problem correctly.  But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:

 

data want;
  set have;
  by id;
  difdate=ifn(first.id,.,dif(relapse_date));
  if first.id or relapse_date>cutoff then do;
    counter=ifn(first.id,1,counter+1);
    cutoff=relapse_date+30;
  end;
  retain cutoff counter;
  format cutoff date9.;
 run;

 

 

But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays?  Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days?  That's what I understand your request to mean.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
Reeza
Super User

You don't specify what you want as output, so here's one way that you can then modify to your needs.

 

data want;
set have; 
by id relapse_date;

dif_date = dif(relapse_date);
retain counter;

if first.id then do;
 dif_date = .;
counter=1;
end;
else if dif_date >= 30 then counter+1;

run;
novinosrin
Tourmaline | Level 20

Not sure what you want in your output, you should provide an output sample too. Try  and modify-

 

data have;

input ID        relapse_Date :mmddyy10.;

format relapse_Date mmddyy10.;

datalines;

1     8/28/2012

1     8/30/2012

1      11/5/2012

1        1/2/2013

1        2/13/2013

1        3/18/2013

1        4/15/2013

2        5/1/2008

2       5/20/2008

2        6/14/2008

;

 

data want;

merge have have(firstobs=2 rename=(relapse_Date=_relapse_Date));

by id;

if first.id then count=1;

if intck('days',relapse_Date,_relapse_Date)>=30 then count+1;

drop _:;

run;

Lefty
Obsidian | Level 7

Thank you! This is close but here is an example of a tricky situation. After I run Reeza's code, I have this:

 

ID

Relapse_date

dif_date

counter

33

5/8/2007

.

1

33

6/5/2007

28

1

33

7/7/2007

32

2

33

8/7/2007

31

3

33

9/11/2007

35

4

33

10/10/2007

29

4

33

11/8/2007

29

4

33

12/4/2007

26

4

33

12/27/2007

23

4

 

And what I would like to have is this:

ID

Relapse_date

dif_date

counter

33

5/8/2007

.

1

33

6/5/2007

28

1

33

7/7/2007

32

2

33

8/7/2007

31

3

33

9/11/2007

35

4

33

10/10/2007

29

4

33

11/8/2007

29

5

33

12/4/2007

26

5

33

12/27/2007

23

6

I would like the program to ignore the relapse at 10/10/2007, because it's less than 30 days after the relapse on 9/11/2007, but then take into account that the relapse on 11/8/2007 IS more than 30 days after the 9/11/2007 relapse and count it. Same issue with the relapse on 12/27/2007- it's less than 30 days after the one on 12/4/2007 but more than 30 days after the one on 11/8/2007. Thanks so much in advance!

mkeintz
PROC Star

I deleted my first reply because I hadn't read the problem correctly.  But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:

 

data want;
  set have;
  by id;
  difdate=ifn(first.id,.,dif(relapse_date));
  if first.id or relapse_date>cutoff then do;
    counter=ifn(first.id,1,counter+1);
    cutoff=relapse_date+30;
  end;
  retain cutoff counter;
  format cutoff date9.;
 run;

 

 

But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays?  Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days?  That's what I understand your request to mean.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Lefty
Obsidian | Level 7

Thank you so much, that worked! To answer your question, yes, I would only want the 5th Wednesday (that was 35 days since the original relapse) to trigger the counter, which is what your code does. I can't thank you enough! 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3101 views
  • 1 like
  • 4 in conversation