BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
victor1893
Fluorite | Level 6

Hi sas experts,

 

I am looking for a solution that starts from a dummy variable and counts the number of changes from value 0 to 1 during a pre-defined time window (in my case: 6 months) for each subject (id column).  The below example shows how the data that I have looks like, (including the col_have) and the outcome column that I want (col_want):

 

data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;

 

To give a bit of details, the algo should take each date and count the number of times for which the col_have changes values from 0 to 1 during the last 6 months.

 

Thank you very much in advance for effort and support!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @victor1893,

 

Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:

proc sort data=data(drop=col_want) out=have;
by id time;
run;

%let n=6; /* length of time window in months */

data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
  _m+1;
  _c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;

/* Optional: */

proc sort data=want;
by id descending time;
run;

It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26
proc sort data=data;
    by id time;
run;
data want;
    set data(drop=col_want);
    by id;
    if first.id then seq=0;
    seq+1;
    prev=ifn(seq>1,lag(col_have),.); 
    prev2=ifn(seq>2,lag2(col_have),.); 
    prev3=ifn(seq>3,lag3(col_have),.); 
    prev4=ifn(seq>4,lag4(col_have),.); 
    prev5=ifn(seq>5,lag5(col_have),.); 
    col_want=(prev=0 and col_have=1) + (prev2=0 and prev=1) + (prev3=0 and prev2=1) + 
        (prev4=0 and prev3=1) + (prev5=0 and prev4=1); 
    drop seq prev:;
run;
--
Paige Miller
victor1893
Fluorite | Level 6

Thanks for the solution. I am looking for something more automatic that can be adjusted to different time windows (I.e. 1 year, 2 year). I am confident that sas allows for a more efficient approach. Problem is still open for suggestions.

PaigeMiller
Diamond | Level 26

You can turn this into a macro where you specify the number of months.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hi @victor1893,

 

Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:

proc sort data=data(drop=col_want) out=have;
by id time;
run;

%let n=6; /* length of time window in months */

data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
  _m+1;
  _c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;

/* Optional: */

proc sort data=want;
by id descending time;
run;

It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.

victor1893
Fluorite | Level 6

Thanks for the solution. Works as expected.

FreelanceReinh
Jade | Level 19

You're welcome. I've just noticed that, for a very minor performance improvement and code simplification, you can replace the _m-1 in the first argument of the MOD function by simply _m without changing the results:

_c[mod(_m,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
Ksharp
Super User
data data;
input id time date9. col_have ;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
proc sort data=data out=temp;
by id time;
run;
data temp2;
 set temp;
 flag=( id=lag(id) and lag(col_have)=0 and col_have=1 );
run;
proc sql;
create table want as
select id,time,col_have,(select sum(flag) from temp2 where id=a.id and time between intnx('month',a.time,-5,'s') and a.time) as want
 from temp2 as a
  order by id,time desc;
quit;
victor1893
Fluorite | Level 6

As much as I wished that this simpler approach would work as good as the accepted solution, the result is not correct. I appreciate your effort! Thank you!

Ksharp
Super User

I misunderstood what you mean. Here is another Hash Table way.

 

data have;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
%let n=6;
data want;
if _n_=1 then do;
 if 0 then set have(rename=(col_have=_col_have));
 declare hash h(dataset:'data(rename=(col_have=_col_have))',hashexp:20);
 h.definekey('id','time');
 h.definedata('_col_have');
 h.definedone();
end;
set have;
want=0;
do i=-%eval(&n.-2) to 0;
 call missing(lag,current);
 if h.find(key:id,key:intnx('month',time,i-1))=0 then lag=_col_have;
 if h.find(key:id,key:intnx('month',time,i))=0 then current=_col_have;
 if lag=0 and current=1 then want+1;
end;
drop i lag current _col_have;
run;
victor1893
Fluorite | Level 6

I confirm that this solution is correct. I still prefer the initial chosen solution as it is more close to the traditional sas language that I (at least) am more familiar with. Nonetheless, I want to thank you for the effort which is much appreciated. I will try to read more about these hash approaches. Cheers!

Tom
Super User Tom
Super User

Your results seem to be for a 5 month window, not 6. 

Is it on purpose?   Why do you want to exclude events that occur in the first month of the window?

 

 

 

victor1893
Fluorite | Level 6

The time windows starts with the current month + 5 additional months from the past which results in 6.

Tom
Super User Tom
Super User

@victor1893 wrote:

The time windows starts with the current month + 5 additional months from the past which results in 6.


So your posted expected values are wrong???

Let's simplify your variable names, so the date value is in a variable named DATE and expected result is in a variable named EXPECT.

data have;
  input id date :date. have expect;
  format date date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;

proc sort;
  by id date;
run;

Here is a method to finds the events and uses a simple series of assignments to create the events from this month EVENT1 to 5 mounts ago EVENT6 on every observation.  Then let's use the SUM() function to find the count over 6 months and 5 months and compare that to the expected count.

data want;
  do month=1 by 1 until (last.id);
    set have;
    by id;
    length sum5 sum6 event1-event6 8;
    event1=have and not lag(have) and not first.id;
    sum5=sum(of event1-event5);
    sum6=sum(of event1-event6);
    output;
    event6=event5;
    event5=event4;
    event4=event3;
    event3=event2;
    event2=event1;
  end;
run;

proc print;
 where sum6 ne expect;
run;

Results show that the expected count is one less than the number of events when there is an event in that 6th month.

Tom_0-1732539443813.png

full results

Tom_1-1732539527488.png

 

 

victor1893
Fluorite | Level 6

Hi Tom, apologies for my limited sas knowledge as I find it difficult to understand the logic of the comparison table you produced. But I had a look at the observations highlighted the result not being correct, I can confirm that this is not the case. For example in case of id 1 in 01JUL2012, we look from 01JUL2012 to 01FEB2012 including. The have column changed from 0 to 1 only once in 01JUN2012.

In the second case for id 1 in 01NOV2012 we look from 01NOV2012 to 01JUN2012 including. The have column change from 0 to 1 once in 01OCT2012. I see no issue with the provided expected column.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 2213 views
  • 3 likes
  • 6 in conversation