- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I was trying to calculate 12 month moving average for each individual clients. I have the following simplified data in a table named individual_moving_avg with various users, dates and their incident count:
SAP | OpenDate | Count |
100004 | Aug-13 | 30 |
100004 | Sep-13 | 49 |
100004 | Oct-13 | 42 |
100004 | Nov-13 | 44 |
100004 | Dec-13 | 37 |
… | … | … |
100005 | Aug-13 | 28 |
100005 | Sep-13 | 17 |
100005 | Oct-13 | 29 |
100005 | Nov-13 | 15 |
100005 | Dec-13 | 13 |
… | … | … |
100007 | Aug-13 | 9 |
100007 | Sep-13 | 12 |
100007 | Oct-13 | 14 |
100007 | Nov-13 | 16 |
100007 | Dec-13 | 23 |
… | … | … |
100011 | Aug-13 | 3 |
100011 | Sep-13 | 3 |
100011 | Oct-13 | 3 |
100011 | Nov-13 | 0 |
100011 | Dec-13 | 3 |
… | … | … |
100012 | Aug-13 | 0 |
100012 | Sep-13 | 1 |
100012 | Oct-13 | 1 |
100012 | Nov-13 | 1 |
100012 | Dec-13 | 3 |
...
Below is my code:
DATA Individual_moving_avg;
SET All_new;
BY SAP OpenDate;
length SAP_Recs 3
Average 8;
retain SAP_Recs;
If first.SAP=1 then SAP_Recs=0;
SAP_Recs= SAP_Recs+1;
if SAP_Recs>11 then do;
Average=(lag11(Count)+lag10(Count)+lag9(Count)+lag8(Count)+lag7(Count)+lag6(Count)+lag5(Count)+
lag4(Count)+lag3(Count)+lag2(Count)+lag(Count)+Count)/12;
end;
drop SAP_Recs;
RUN;
However, my result does not come out accurately. My correct count of 12 mo moving average does not show until the 23th month. Can anyone possibly help me out on this? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. If you have SAS/ETS you can use Proc Expand
proc sort data=sashelp.stocks out=stocks;
by stock date;
run;
proc expand data=stocks out=stocks2;
id date;
by stock;
convert open=open_avg/ transformout=(movave 12 trimleft 11);
run;
proc print data=stocks2 (obs=20);
run;
2. Or you can use an array method with Base SAS
proc sort data=sashelp.stocks out=stocks;
by stock date;
run;
data want;
set stocks;
by stock;
retain pre1-pre12;
array pre(12);
if first.stock then
do;
call missing(of pre(*));
count=0;
end;
count+1;
index=mod(count, 12)+1;
pre(index)=open;
if count>=12 then
moving_average=mean(of pre(*));
drop count index pre1-pre12;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the code, Reez. Unfortunately I don thave SAS/ETS and can only use data step. I tried your array metho but it still didn't give me the right result. It that becuase there are gaps between my each ID?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
I want to calcultae moving avg for last 30 days from the current day,
how i can implement your given code.
Thanks for the help in advance.
Regards,
Kaushal
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kaushalsolanki please start a new thread with your question, and include sample data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this Reeza, I made this work for what I need.
Used your code with the array method to calculate running daily average.
I had table SUM_PER_DAY_ initially, and need to get the daily average computed from the start of the month.
This is my code modified from yours:
data _null_;
set sum_per_day_;
call symput('max_day',strip(max(day(final_date))));
run;
%put &=max_day;
data want_sum_per_day_;
set sum_per_day_;
retain pre1-pre&max_day.;
array pre(&max_day.);
if first.final_date then
do;
call missing(of pre(*));
count=0;
end;
count+1;
index=mod(count, &max_day.)+1;
pre(index)=subtotal;
moving_average=mean(of pre(*))/1000000;
drop count index pre1-pre&max_day.;
run;
Output is as below, with moving average by the millions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have a big table ? If it was ,try Reeza's code . NOTE: Reeza's code would work only if there are not gap within each ID .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ksharp,
Yes I do have a large table. It's a 3*117k table orignally. I tried Reeza's array but didn't work. Any suggestions? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Less than a million isn't a big table. The array method should work fine, please post your non working code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
My three vairalbes are:
SAP | OpenDate | Count |
I'm counting the 12-moving average for count by SAP ID. Below is my array code:
data Individual_moving_avg;
set All_new2;
by SAP;
retain pre1-pre12;
array pre(12);
if first.SAP then
do;
call missing(of pre(*));
count=0;
end;
count+1;
index=mod(count, 12)+1;
pre(index)=OpenDate;
if count>=12 then
moving_average=mean(of pre(*));
drop count index pre1-pre12;
run;
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In addition, there are gaps between my SAP ID. They are not countining number.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unless you want the average of your dates this line is your biggest issue:
pre(index)=OpenDate;
Assign the value you want to average.
pre(index)=Count;
Because you're looking at month, you could put the month in. You'd have to test it though, not sure how it work with missing months.
pre(month(openDate))=Count;
Otherwise look at proc timeseries or a similar proc to add in your missing data points if you need to average over 12 months with missing dates. You need to decide how to account for it as well, if missing is it an average over 11 months or less, basically available data? Or do you need to modify it in some manner.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure what you mean by gaps in SAP ID. Each ID is its own, it doesn't matter. The months/dates need to be continuous.
If you're missing months is where you'll run into issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reeza, Yes. I mean gap is missing month . If you have big table try Hash Table: data have; do sap=1 to 100; do year=1990 to 2016; do month=1 to 12; opendate=mdy(month,1,year); count=ceil(100*ranuni(1234)); output; end; end; end; format opendate monyy7.; drop year month; run; data want; if _n_=1 then do; if 0 then set have(rename=(count=_count)); declare hash h(dataset:'have(rename=(count=_count))',hashexp:20); h.definekey('sap','opendate'); h.definedata('_count'); h.definedone(); end; set have; sum=0;n=0; do i=intnx('month',opendate,-11) to opendate ; if h.find(key:sap,key:i)=0 then do; sum+_count; n+1; end; end; mean=sum/n; drop i sum n _count; run;