So let's look at the first 7 observations for ID=1.
You can see there are two events where the HAVE flag transitioned from 0 to 1.
The first one on 01FEB2012 and the second on 01JUN2012.
Both are within the 6 month window for the 7th observation on 01JUL2012.
So I repeat my question from before:
Your results seem to be for a 5 month window, not 6.
Is it on purpose? Why do you want to exclude events that occur in the first month of the window?
I think I understand your judgement, and if not, then please correct me.
You are saying that for 01JUL2012 you should have 2 change events but you only see 1 because the observation in 01JAN2012 is not included in the 6 months window of 01JUL2012. The change from 0 to 1 must capture both 0 and 1 in the same window to be accounted for, whereas for 01JUL2012 the id is seen as "starting" with 1 in 01FEB2012. I hope this makes it clear and apologies for the confusion.
I realize you already have a designated solution.
But because it utilizes two proc sorts plus a data step, here is a single data step that avoids the sorts. Instead, it reads each ID group twice, first to generate the count of events over 6 months windows, and the second time to assign those counts to the appropriate observation:
data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
run;
data want (drop=_: i);
set data (in=firstpass) data (in=secondpass);
by id;
array ev_count{24} _temporary_; /*Let array dimension accommodate longest ID sequence*/
if first.id then call missing (_n1,_n2,of ev_count{*});
if firstpass then do;
_n1+1;
if first.id=0 and col_have=0 and lag(col_have)=1 then do i=max(1,_n1-5) to _n1-1;
ev_count{i}=sum(ev_count{i},1);
end;
end;
if secondpass;
_n2+1;
n_event=sum(0,ev_count{_n2});
run;
Hi, thanks for the solution. Result is correct. Even thought the chosen solution uses 2 additional proc sorts, i find it more straightforward and easy to understand. Of course this is a subjective judgement based on my individual sas experience, but in such cases I would prefer a solution that Is more easily understandable and debug-able if of course the difference in performance is not significant. Nonetheless, I very much appreciate your effort for the solution. Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.