BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

So let's look at the first 7 observations for ID=1.

Tom_0-1732541639977.png

You can see there are two events where the HAVE flag transitioned from 0 to 1.

The first one on 01FEB2012 and the second on 01JUN2012. 

Both are within the 6 month window for the 7th observation on 01JUL2012.

 

So I repeat my question from before:

Your results seem to be for a 5 month window, not 6. 

Is it on purpose?   Why do you want to exclude events that occur in the first month of the window?

victor1893
Fluorite | Level 6

I think I understand your judgement, and if not, then please correct me.

You are saying that for 01JUL2012 you should have 2 change events but you only see 1 because the observation in 01JAN2012 is not included in the 6 months window of 01JUL2012. The change from 0 to 1 must capture both 0 and 1 in the same window to be accounted for, whereas for 01JUL2012 the id is seen as "starting" with 1 in 01FEB2012. I hope this makes it clear and apologies for the confusion.

mkeintz
PROC Star

I realize you already have a designated solution.

 

But because it utilizes two proc sorts plus a data step, here is a single data step that avoids the sorts.  Instead, it reads each ID group twice, first to generate the count of events over 6 months windows, and the second time to assign those counts to the appropriate observation:

 

data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
run;


data want (drop=_: i);
  set data (in=firstpass)  data (in=secondpass);
  by id;

  array ev_count{24} _temporary_;  /*Let array dimension accommodate longest ID sequence*/

  if first.id then call missing (_n1,_n2,of ev_count{*});

  if firstpass then do;
    _n1+1;
    if first.id=0 and col_have=0 and lag(col_have)=1 then do i=max(1,_n1-5) to _n1-1;
      ev_count{i}=sum(ev_count{i},1);
    end;
  end;

  if secondpass;
  _n2+1;
  n_event=sum(0,ev_count{_n2});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
victor1893
Fluorite | Level 6

Hi, thanks for the solution. Result is correct. Even thought the chosen solution uses 2 additional proc sorts, i find it more straightforward and easy to understand. Of course this is a subjective judgement based on my individual sas experience, but in such cases I would prefer a solution that Is more easily understandable and debug-able if of course the difference in performance is not significant. Nonetheless, I very much appreciate your effort for the solution. Thank you!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 2947 views
  • 3 likes
  • 6 in conversation