Hello
I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.
data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;
Here’s a quick example.
https://gist.github.com/statgeek/27e23c015eae7953eff2
Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed.
@eramirez wrote:
Hello
I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.
data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;
What do you want the output to look like? Since you are doing 7-day rolling statistics, do you want to start each zip code with the 7th observation, such that it is the first with a completely populated 7-day window?
Hello
thanks for the reply. The results can start with the 7th observation if that helps, the variable N will indicate when the 7th observation begins so when I overlay the MEANXI (line) values over the count (bar), I can use N>6 to exclude those first six values.
Thanks
Enrique
Here’s a quick example.
https://gist.github.com/statgeek/27e23c015eae7953eff2
Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed.
@eramirez wrote:
Hello
I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.
data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;
Thank you, this works also and useful for other stats!
In cases like this, I would suggest maintaining an array containing the most recent 7 values of the variables in question. Here's an example getting the 7-day rolling mean of COUNT:
data zips1;
infile datalines truncover;
input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;
format Date DATE9.;
datalines;
01JAN2020 60601 16.666667 1 6
02JAN2020 60601 0 0 8
03JAN2020 60601 14.285714 1 7
04JAN2020 60601 0 0 5
05JAN2020 60601 0 0 7
06JAN2020 60601 0 0 8
07JAN2020 60601 0 0 6
08JAN2020 60601 0 0 8
09JAN2020 60601 20 1 5
10JAN2020 60601 0 0 6
11JAN2020 60601 0 0 8
12JAN2020 60601 0 0 4
13JAN2020 60601 0 0 8
14JAN2020 60601 0 0 10
15JAN2020 60601 0 0 9
16JAN2020 60601 25 1 4
17JAN2020 60601 0 0 4
18JAN2020 60601 0 0 6
19JAN2020 60601 0 0 4
20JAN2020 60601 0 0 6
run;
data zips2;
set zips1;
by zipcode;
obs+1;
if first.zipcode then obs=1;
array _cntarray {0:6} _temporary_;
_cntarray{mod(obs,7)}=count;
if obs>=7;
mean_count=mean(of _cntarray{*});
run;
Now, if you really want to maintain a rolling sum to which you add the current COUNT and subtract lag7(count), followed by division-by-7, you could do this:
data zips2;
set zips1;
by zipcode date ;
obs+1;
if first.zipcode then obs=1;
sum_count + count + -coalesce(lag7(count),0);
if obs>=7;
mean_count=sum_count/7;
run;
The problem with the second approach is that, for long time series, you could accumulate some minor computational rounding errors, such that the end of the series might not excactly equal one-seventh of the sum of the last 7 obs.
OTOH, the second approach could be a bit faster, especially if you want, say, a 40-day rolling window instead of a 7-day window.
Thank you so much! The first approach works great. I potentially may have long time series data so thank you for the 2nd method. I will keep them both handy.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.