Solved: Counting the number of event trigger over a pre-defined period - Page 2

Tom · Posted 11-25-2024 08:35 AM

So let's look at the first 7 observations for ID=1.

You can see there are two events where the HAVE flag transitioned from 0 to 1.

The first one on 01FEB2012 and the second on 01JUN2012.

Both are within the 6 month window for the 7th observation on 01JUL2012.

So I repeat my question from before:

Your results seem to be for a 5 month window, not 6.

Is it on purpose? Why do you want to exclude events that occur in the first month of the window?

victor1893 · Posted 11-25-2024 09:06 AM

I think I understand your judgement, and if not, then please correct me.

You are saying that for 01JUL2012 you should have 2 change events but you only see 1 because the observation in 01JAN2012 is not included in the 6 months window of 01JUL2012. The change from 0 to 1 must capture both 0 and 1 in the same window to be accounted for, whereas for 01JUL2012 the id is seen as "starting" with 1 in 01FEB2012. I hope this makes it clear and apologies for the confusion.

mkeintz · Posted 11-24-2024 07:47 PM

I realize you already have a designated solution.

But because it utilizes two proc sorts plus a data step, here is a single data step that avoids the sorts. Instead, it reads each ID group twice, first to generate the count of events over 6 months windows, and the second time to assign those counts to the appropriate observation:

data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
run;


data want (drop=_: i);
  set data (in=firstpass)  data (in=secondpass);
  by id;

  array ev_count{24} _temporary_;  /*Let array dimension accommodate longest ID sequence*/

  if first.id then call missing (_n1,_n2,of ev_count{*});

  if firstpass then do;
    _n1+1;
    if first.id=0 and col_have=0 and lag(col_have)=1 then do i=max(1,_n1-5) to _n1-1;
      ev_count{i}=sum(ev_count{i},1);
    end;
  end;

  if secondpass;
  _n2+1;
  n_event=sum(0,ev_count{_n2});
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

victor1893 · Posted 11-25-2024 03:48 AM

Hi, thanks for the solution. Result is correct. Even thought the chosen solution uses 2 additional proc sorts, i find it more straightforward and easy to understand. Of course this is a subjective judgement based on my individual sas experience, but in such cases I would prefer a solution that Is more easily understandable and debug-able if of course the difference in performance is not significant. Nonetheless, I very much appreciate your effort for the solution. Thank you!

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Registration is open

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Re: Counting the number of event trigger over a pre-defined period

Registration is open

SAS Training: Just a Click Away