Hi sas experts,
I am looking for a solution that starts from a dummy variable and counts the number of changes from value 0 to 1 during a pre-defined time window (in my case: 6 months) for each subject (id column). The below example shows how the data that I have looks like, (including the col_have) and the outcome column that I want (col_want):
data data;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
To give a bit of details, the algo should take each date and count the number of times for which the col_have changes values from 0 to 1 during the last 6 months.
Thank you very much in advance for effort and support!
Hi @victor1893,
Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:
proc sort data=data(drop=col_want) out=have;
by id time;
run;
%let n=6; /* length of time window in months */
data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
_m+1;
_c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;
/* Optional: */
proc sort data=want;
by id descending time;
run;
It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.
proc sort data=data;
by id time;
run;
data want;
set data(drop=col_want);
by id;
if first.id then seq=0;
seq+1;
prev=ifn(seq>1,lag(col_have),.);
prev2=ifn(seq>2,lag2(col_have),.);
prev3=ifn(seq>3,lag3(col_have),.);
prev4=ifn(seq>4,lag4(col_have),.);
prev5=ifn(seq>5,lag5(col_have),.);
col_want=(prev=0 and col_have=1) + (prev2=0 and prev=1) + (prev3=0 and prev2=1) +
(prev4=0 and prev3=1) + (prev5=0 and prev4=1);
drop seq prev:;
run;
Thanks for the solution. I am looking for something more automatic that can be adjusted to different time windows (I.e. 1 year, 2 year). I am confident that sas allows for a more efficient approach. Problem is still open for suggestions.
You can turn this into a macro where you specify the number of months.
Hi @victor1893,
Here's an algorithm using an arbitrary number of months (macro variable n), &n>1, as the length of the time window:
proc sort data=data(drop=col_want) out=have;
by id time;
run;
%let n=6; /* length of time window in months */
data want(drop=_:);
array _c[0:%eval(&n-2)] _temporary_;
set have;
by id;
_prev=lag(col_have);
if first.id then call missing(of _c[*]);
else do;
_m+1;
_c[mod(_m-1,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
end;
col_want=n(of _c[*]);
run;
/* Optional: */
proc sort data=want;
by id descending time;
run;
It still assumes that the values of variable TIME are a contiguous series of months for each ID, so that counting observations is equivalent to counting months.
Thanks for the solution. Works as expected.
You're welcome. I've just noticed that, for a very minor performance improvement and code simplification, you can replace the _m-1 in the first argument of the MOD function by simply _m without changing the results:
_c[mod(_m,%eval(&n-1))]=ifn(_prev=0 & col_have=1,1,.);
data data;
input id time date9. col_have ;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
proc sort data=data out=temp;
by id time;
run;
data temp2;
set temp;
flag=( id=lag(id) and lag(col_have)=0 and col_have=1 );
run;
proc sql;
create table want as
select id,time,col_have,(select sum(flag) from temp2 where id=a.id and time between intnx('month',a.time,-5,'s') and a.time) as want
from temp2 as a
order by id,time desc;
quit;
As much as I wished that this simpler approach would work as good as the accepted solution, the result is not correct. I appreciate your effort! Thank you!
I misunderstood what you mean. Here is another Hash Table way.
data have;
input id time date9. col_have col_want;
format time date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
run;
%let n=6;
data want;
if _n_=1 then do;
if 0 then set have(rename=(col_have=_col_have));
declare hash h(dataset:'data(rename=(col_have=_col_have))',hashexp:20);
h.definekey('id','time');
h.definedata('_col_have');
h.definedone();
end;
set have;
want=0;
do i=-%eval(&n.-2) to 0;
call missing(lag,current);
if h.find(key:id,key:intnx('month',time,i-1))=0 then lag=_col_have;
if h.find(key:id,key:intnx('month',time,i))=0 then current=_col_have;
if lag=0 and current=1 then want+1;
end;
drop i lag current _col_have;
run;
I confirm that this solution is correct. I still prefer the initial chosen solution as it is more close to the traditional sas language that I (at least) am more familiar with. Nonetheless, I want to thank you for the effort which is much appreciated. I will try to read more about these hash approaches. Cheers!
Your results seem to be for a 5 month window, not 6.
Is it on purpose? Why do you want to exclude events that occur in the first month of the window?
The time windows starts with the current month + 5 additional months from the past which results in 6.
@victor1893 wrote:
The time windows starts with the current month + 5 additional months from the past which results in 6.
So your posted expected values are wrong???
Let's simplify your variable names, so the date value is in a variable named DATE and expected result is in a variable named EXPECT.
data have;
input id date :date. have expect;
format date date9.;
datalines;
1 01DEC2012 1 2
1 01NOV2012 0 1
1 01OCT2012 1 2
1 01SEP2012 0 1
1 01AUG2012 1 1
1 01JUL2012 1 1
1 01JUN2012 1 2
1 01MAY2012 0 1
1 01APR2012 0 1
1 01MAR2012 1 1
1 01FEB2012 1 1
1 01JAN2012 0 0
2 01DEC2013 0 0
2 01NOV2013 0 0
2 01OCT2013 0 1
2 01SEP2013 0 1
2 01AUG2013 0 1
2 01JUL2013 0 2
2 01JUN2013 1 2
2 01MAY2013 0 1
2 01APR2013 1 1
2 01MAR2013 1 1
2 01FEB2013 0 0
2 01JAN2013 1 0
;
proc sort;
by id date;
run;
Here is a method to finds the events and uses a simple series of assignments to create the events from this month EVENT1 to 5 mounts ago EVENT6 on every observation. Then let's use the SUM() function to find the count over 6 months and 5 months and compare that to the expected count.
data want;
do month=1 by 1 until (last.id);
set have;
by id;
length sum5 sum6 event1-event6 8;
event1=have and not lag(have) and not first.id;
sum5=sum(of event1-event5);
sum6=sum(of event1-event6);
output;
event6=event5;
event5=event4;
event4=event3;
event3=event2;
event2=event1;
end;
run;
proc print;
where sum6 ne expect;
run;
Results show that the expected count is one less than the number of events when there is an event in that 6th month.
full results
Hi Tom, apologies for my limited sas knowledge as I find it difficult to understand the logic of the comparison table you produced. But I had a look at the observations highlighted the result not being correct, I can confirm that this is not the case. For example in case of id 1 in 01JUL2012, we look from 01JUL2012 to 01FEB2012 including. The have column changed from 0 to 1 only once in 01JUN2012.
In the second case for id 1 in 01NOV2012 we look from 01NOV2012 to 01JUN2012 including. The have column change from 0 to 1 once in 01OCT2012. I see no issue with the provided expected column.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.