BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

So let's look at the first 7 observations for ID=1.

Tom_0-1732541639977.png

You can see there are two events where the HAVE flag transitioned from 0 to 1.

The first one on 01FEB2012 and the second on 01JUN2012. 

Both are within the 6 month window for the 7th observation on 01JUL2012.

 

So I repeat my question from before:

Your results seem to be for a 5 month window, not 6. 

Is it on purpose?   Why do you want to exclude events that occur in the first month of the window?

victor1893
Fluorite | Level 6

I think I understand your judgement, and if not, then please correct me.

You are saying that for 01JUL2012 you should have 2 change events but you only see 1 because the observation in 01JAN2012 is not included in the 6 months window of 01JUL2012. The change from 0 to 1 must capture both 0 and 1 in the same window to be accounted for, whereas for 01JUL2012 the id is seen as "starting" with 1 in 01FEB2012. I hope this makes it clear and apologies for the confusion.

mkeintz
PROC Star

I realize you already have a designated solution.

 

But because it utilizes two proc sorts plus a data step, here is a single data step that avoids the sorts.  Instead, it reads each ID group twice, first to generate the count of events over 6 months windows, and the second time to assign those counts to the appropriate observation:

 

data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
run;


data want (drop=_: i);
  set data (in=firstpass)  data (in=secondpass);
  by id;

  array ev_count{24} _temporary_;  /*Let array dimension accommodate longest ID sequence*/

  if first.id then call missing (_n1,_n2,of ev_count{*});

  if firstpass then do;
    _n1+1;
    if first.id=0 and col_have=0 and lag(col_have)=1 then do i=max(1,_n1-5) to _n1-1;
      ev_count{i}=sum(ev_count{i},1);
    end;
  end;

  if secondpass;
  _n2+1;
  n_event=sum(0,ev_count{_n2});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
victor1893
Fluorite | Level 6

Hi, thanks for the solution. Result is correct. Even thought the chosen solution uses 2 additional proc sorts, i find it more straightforward and easy to understand. Of course this is a subjective judgement based on my individual sas experience, but in such cases I would prefer a solution that Is more easily understandable and debug-able if of course the difference in performance is not significant. Nonetheless, I very much appreciate your effort for the solution. Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 2689 views
  • 3 likes
  • 6 in conversation