Solved: Re: Counting the number of event trigger over a pre-defined period

victor1893 · Posted 11-20-2024 09:28 AM

Hi sas experts,

I am looking for a solution that starts from a dummy variable and counts the number of changes from value 0 to 1 during a pre-defined time window (in my case: 6 months) for each subject (id column). The below example shows how the data that I have looks like, (including the col_have) and the outcome column that I want (col_want):

data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;

To give a bit of details, the algo should take each date and count the number of times for which the col_have changes values from 0 to 1 during the last 6 months.

Thank you very much in advance for effort and support!

FreelanceReinh · Posted 11-20-2024 11:48 AM

Hi @victor1893,

Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:

proc sort data=data(drop=col_want) out=have;
by id time;
run;

%let n=6; /* length of time window in months */

data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
  _m+1;
  _c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;

/* Optional: */

proc sort data=want;
by id descending time;
run;

It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.

View solution in original post

PaigeMiller · Posted 11-20-2024 10:04 AM

proc sort data=data;
    by id time;
run;
data want;
    set data(drop=col_want);
    by id;
    if first.id then seq=0;
    seq+1;
    prev=ifn(seq>1,lag(col_have),.); 
    prev2=ifn(seq>2,lag2(col_have),.); 
    prev3=ifn(seq>3,lag3(col_have),.); 
    prev4=ifn(seq>4,lag4(col_have),.); 
    prev5=ifn(seq>5,lag5(col_have),.); 
    col_want=(prev=0 and col_have=1) + (prev2=0 and prev=1) + (prev3=0 and prev2=1) + 
        (prev4=0 and prev3=1) + (prev5=0 and prev4=1); 
    drop seq prev:;
run;

--
Paige Miller

victor1893 · Posted 11-20-2024 10:52 AM

Thanks for the solution. I am looking for something more automatic that can be adjusted to different time windows (I.e. 1 year, 2 year). I am confident that sas allows for a more efficient approach. Problem is still open for suggestions.

PaigeMiller · Posted 11-20-2024 11:21 AM

You can turn this into a macro where you specify the number of months.

--
Paige Miller

FreelanceReinh · Posted 11-20-2024 11:48 AM

Hi @victor1893,

Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:

proc sort data=data(drop=col_want) out=have;
by id time;
run;

%let n=6; /* length of time window in months */

data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
  _m+1;
  _c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;

/* Optional: */

proc sort data=want;
by id descending time;
run;

It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.

victor1893 · Posted 11-21-2024 03:28 AM

Thanks for the solution. Works as expected.

FreelanceReinh · Posted 11-21-2024 03:53 AM

You're welcome. I've just noticed that, for a very minor performance improvement and code simplification, you can replace the _m-1 in the first argument of the MOD function by simply _m without changing the results:

_c[mod(_m,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);

Ksharp · Posted 11-20-2024 09:14 PM

data data;
input id time date9. col_have ;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
proc sort data=data out=temp;
by id time;
run;
data temp2;
 set temp;
 flag=( id=lag(id) and lag(col_have)=0 and col_have=1 );
run;
proc sql;
create table want as
select id,time,col_have,(select sum(flag) from temp2 where id=a.id and time between intnx('month',a.time,-5,'s') and a.time) as want
 from temp2 as a
  order by id,time desc;
quit;

victor1893 · Posted 11-21-2024 03:30 AM

As much as I wished that this simpler approach would work as good as the accepted solution, the result is not correct. I appreciate your effort! Thank you!

Ksharp · Posted 11-23-2024 01:53 AM

I misunderstood what you mean. Here is another Hash Table way.

data have;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
%let n=6;
data want;
if _n_=1 then do;
 if 0 then set have(rename=(col_have=_col_have));
 declare hash h(dataset:'data(rename=(col_have=_col_have))',hashexp:20);
 h.definekey('id','time');
 h.definedata('_col_have');
 h.definedone();
end;
set have;
want=0;
do i=-%eval(&n.-2) to 0;
 call missing(lag,current);
 if h.find(key:id,key:intnx('month',time,i-1))=0 then lag=_col_have;
 if h.find(key:id,key:intnx('month',time,i))=0 then current=_col_have;
 if lag=0 and current=1 then want+1;
end;
drop i lag current _col_have;
run;

victor1893 · Posted 11-25-2024 03:42 AM

I confirm that this solution is correct. I still prefer the initial chosen solution as it is more close to the traditional sas language that I (at least) am more familiar with. Nonetheless, I want to thank you for the effort which is much appreciated. I will try to read more about these hash approaches. Cheers!

Tom · Posted 11-23-2024 11:03 PM

Your results seem to be for a 5 month window, not 6.

Is it on purpose? Why do you want to exclude events that occur in the first month of the window?

victor1893 · Posted 11-25-2024 03:35 AM

The time windows starts with the current month + 5 additional months from the past which results in 6.

Tom · Posted 11-25-2024 07:58 AM

@victor1893 wrote:

The time windows starts with the current month + 5 additional months from the past which results in 6.

So your posted expected values are wrong???

Let's simplify your variable names, so the date value is in a variable named DATE and expected result is in a variable named EXPECT.

data have;
  input id date :date. have expect;
  format date date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;

proc sort;
  by id date;
run;

Here is a method to finds the events and uses a simple series of assignments to create the events from this month EVENT1 to 5 mounts ago EVENT6 on every observation. Then let's use the SUM() function to find the count over 6 months and 5 months and compare that to the expected count.

data want;
  do month=1 by 1 until (last.id);
    set have;
    by id;
    length sum5 sum6 event1-event6 8;
    event1=have and not lag(have) and not first.id;
    sum5=sum(of event1-event5);
    sum6=sum(of event1-event6);
    output;
    event6=event5;
    event5=event4;
    event4=event3;
    event3=event2;
    event2=event1;
  end;
run;

proc print;
 where sum6 ne expect;
run;

Results show that the expected count is one less than the number of events when there is an event in that 6th month.

full results

victor1893 · Posted 11-25-2024 08:24 AM

Hi Tom, apologies for my limited sas knowledge as I find it difficult to understand the logic of the comparison table you produced. But I had a look at the observations highlighted the result not being correct, I can confirm that this is not the case. For example in case of id 1 in 01JUL2012, we look from 01JUL2012 to 01FEB2012 including. The have column changed from 0 to 1 only once in 01JUN2012.

In the second case for id 1 in 01NOV2012 we look from 01NOV2012 to 01JUN2012 including. The have column change from 0 to 1 once in 01OCT2012. I see no issue with the provided expected column.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away