Solved: Re: Count instances at least 30 days apart

Lefty · Posted 09-06-2017 07:01 PM

I have data like this:

ID relapse_Date

1 8/28/2012

1 8/30/2012

1 11/5/2012

1 1/2/2013

1 2/13/2013

1 3/18/2013

1 4/15/2013

2 5/1/2008

2 5/20/2008

2 6/14/2008

I would like to count the unique number of dates that each ID has a replase, but only count relapses that are 30 days or more apart from each other. So even though ID 1 has 7 unique relapse dates, I only want to count 5 of them (8/28/2012, 11/5/2012, 1/2/2013, 2/13/2013, 3/18/2013; the relapses on 8/30/2012 and 4/15/2013 are within 30 days of other relapse dates). I've been trying to use lag, retain, and multiple set statements but can't seem to make any of those solve my problem. Thanks in advance for any help.

mkeintz · Posted 09-07-2017 11:35 AM

I deleted my first reply because I hadn't read the problem correctly. But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:

data want;
  set have;
  by id;
  difdate=ifn(first.id,.,dif(relapse_date));
  if first.id or relapse_date>cutoff then do;
    counter=ifn(first.id,1,counter+1);
    cutoff=relapse_date+30;
  end;
  retain cutoff counter;
  format cutoff date9.;
 run;

But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays? Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days? That's what I understand your request to mean.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Reeza · Posted 09-06-2017 07:15 PM

You don't specify what you want as output, so here's one way that you can then modify to your needs.

data want;
set have; 
by id relapse_date;

dif_date = dif(relapse_date);
retain counter;

if first.id then do;
 dif_date = .;
counter=1;
end;
else if dif_date >= 30 then counter+1;

run;

novinosrin · Posted 09-06-2017 07:24 PM

Not sure what you want in your output, you should provide an output sample too. Try and modify-

data have;

input ID relapse_Date :mmddyy10.;

format relapse_Date mmddyy10.;

datalines;

1 8/28/2012

1 8/30/2012

1 11/5/2012

1 1/2/2013

1 2/13/2013

1 3/18/2013

1 4/15/2013

2 5/1/2008

2 5/20/2008

2 6/14/2008

;

data want;

merge have have(firstobs=2 rename=(relapse_Date=_relapse_Date));

by id;

if first.id then count=1;

if intck('days',relapse_Date,_relapse_Date)>=30 then count+1;

drop _:;

run;

Lefty · Posted 09-07-2017 10:29 AM

Thank you! This is close but here is an example of a tricky situation. After I run Reeza's code, I have this:

ID	Relapse_date	dif_date	counter
33	5/8/2007	.	1
33	6/5/2007	28	1
33	7/7/2007	32	2
33	8/7/2007	31	3
33	9/11/2007	35	4
33	10/10/2007	29	4
33	11/8/2007	29	4
33	12/4/2007	26	4
33	12/27/2007	23	4

And what I would like to have is this:

ID	Relapse_date	dif_date	counter
33	5/8/2007	.	1
33	6/5/2007	28	1
33	7/7/2007	32	2
33	8/7/2007	31	3
33	9/11/2007	35	4
33	10/10/2007	29	4
33	11/8/2007	29	5
33	12/4/2007	26	5
33	12/27/2007	23	6

I would like the program to ignore the relapse at 10/10/2007, because it's less than 30 days after the relapse on 9/11/2007, but then take into account that the relapse on 11/8/2007 IS more than 30 days after the 9/11/2007 relapse and count it. Same issue with the relapse on 12/27/2007- it's less than 30 days after the one on 12/4/2007 but more than 30 days after the one on 11/8/2007. Thanks so much in advance!

mkeintz · Posted 09-07-2017 11:35 AM

I deleted my first reply because I hadn't read the problem correctly. But you apparently need to retain a cutoff date, which is updated only when a relapse date is more than 30 days after the START of the previous relapse regime:

data want;
  set have;
  by id;
  difdate=ifn(first.id,.,dif(relapse_date));
  if first.id or relapse_date>cutoff then do;
    counter=ifn(first.id,1,counter+1);
    cutoff=relapse_date+30;
  end;
  retain cutoff counter;
  format cutoff date9.;
 run;

But what if you have a series of relapse_dates on, say, 6 consecutive Wednesdays? Do you really want the 5th Wednesday (=original Wed plus 35 days) to increment the COUNTER, even though it trails the preceding Wed only by 7 days? That's what I understand your request to mean.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Lefty · Posted 09-07-2017 11:50 AM

Thank you so much, that worked! To answer your question, yes, I would only want the 5th Wednesday (that was 35 days since the original relapse) to trigger the counter, which is what your code does. I can't thank you enough! 🙂

Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Re: Count instances at least 30 days apart

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away