So let's look at the first 7 observations for ID=1.
You can see there are two events where the HAVE flag transitioned from 0 to 1.
The first one on 01FEB2012 and the second on 01JUN2012.
Both are within the 6 month window for the 7th observation on 01JUL2012.
So I repeat my question from before:
Your results seem to be for a 5 month window, not 6.
Is it on purpose? Why do you want to exclude events that occur in the first month of the window?
I think I understand your judgement, and if not, then please correct me.
You are saying that for 01JUL2012 you should have 2 change events but you only see 1 because the observation in 01JAN2012 is not included in the 6 months window of 01JUL2012. The change from 0 to 1 must capture both 0 and 1 in the same window to be accounted for, whereas for 01JUL2012 the id is seen as "starting" with 1 in 01FEB2012. I hope this makes it clear and apologies for the confusion.
I realize you already have a designated solution.
But because it utilizes two proc sorts plus a data step, here is a single data step that avoids the sorts. Instead, it reads each ID group twice, first to generate the count of events over 6 months windows, and the second time to assign those counts to the appropriate observation:
data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
run;
data want (drop=_: i);
set data (in=firstpass) data (in=secondpass);
by id;
array ev_count{24} _temporary_; /*Let array dimension accommodate longest ID sequence*/
if first.id then call missing (_n1,_n2,of ev_count{*});
if firstpass then do;
_n1+1;
if first.id=0 and col_have=0 and lag(col_have)=1 then do i=max(1,_n1-5) to _n1-1;
ev_count{i}=sum(ev_count{i},1);
end;
end;
if secondpass;
_n2+1;
n_event=sum(0,ev_count{_n2});
run;
Hi, thanks for the solution. Result is correct. Even thought the chosen solution uses 2 additional proc sorts, i find it more straightforward and easy to understand. Of course this is a subjective judgement based on my individual sas experience, but in such cases I would prefer a solution that Is more easily understandable and debug-able if of course the difference in performance is not significant. Nonetheless, I very much appreciate your effort for the solution. Thank you!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.