BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eramirez
Fluorite | Level 6

Hello

 

I found some sas code online that calculates a moving average.  I don't have SAS/ETS.  It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL.  I want to run the code through each group of zipcodes, calculating the new moving average for each.  Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it.  Thank you.

data zips1;
  infile datalines dsd truncover;
  input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;
  format Date DATE9.;
  datalines;
01JAN2020 60601 16.666667 1 6
02JAN2020 60601 0 0 8
03JAN2020 60601 14.285714 1 7
04JAN2020 60601 0 0 5
05JAN2020 60601 0 0 7
06JAN2020 60601 0 0 8
07JAN2020 60601 0 0 6
08JAN2020 60601 0 0 8
09JAN2020 60601 20 1 5
10JAN2020 60601 0 0 6
11JAN2020 60601 0 0 8
12JAN2020 60601 0 0 4
13JAN2020 60601 0 0 8
14JAN2020 60601 0 0 10
15JAN2020 60601 0 0 9
16JAN2020 60601 25 1 4
17JAN2020 60601 0 0 4
18JAN2020 60601 0 0 6
19JAN2020 60601 0 0 4
20JAN2020 60601 0 0 6

data zips2 ; keep date zipcode pct count total n meanxi sumxi;
set zips1;
 if missing(count ) then
 do;
 OBS = 0;
 count = 0.0;
 end;
 else OBS = 1;
 XI7 = lag7(count );
 OBS7 = lag7(obs);
 if missing(xi7) then xi7 = 0.0;
 if missing(obs7) then obs7 = 0;
 LDATE = lag2(date);
 format ldate date9. ; 

 if _N_ = 1 then
 do;
 SUMXI = 0.0;
 N = 0;
 end;
 else;
 sumxi = sumxi + count - xi7;
 n = n + obs - obs7;
 MEANXI = sumxi / n ;
 retain sumxi n;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Here’s a quick example. 
https://gist.github.com/statgeek/27e23c015eae7953eff2

 

Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed. 


@eramirez wrote:

Hello

 

I found some sas code online that calculates a moving average.  I don't have SAS/ETS.  It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL.  I want to run the code through each group of zipcodes, calculating the new moving average for each.  Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it.  Thank you.

data zips1;
  infile datalines dsd truncover;
  input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;
  format Date DATE9.;
  datalines;
01JAN2020 60601 16.666667 1 6
02JAN2020 60601 0 0 8
03JAN2020 60601 14.285714 1 7
04JAN2020 60601 0 0 5
05JAN2020 60601 0 0 7
06JAN2020 60601 0 0 8
07JAN2020 60601 0 0 6
08JAN2020 60601 0 0 8
09JAN2020 60601 20 1 5
10JAN2020 60601 0 0 6
11JAN2020 60601 0 0 8
12JAN2020 60601 0 0 4
13JAN2020 60601 0 0 8
14JAN2020 60601 0 0 10
15JAN2020 60601 0 0 9
16JAN2020 60601 25 1 4
17JAN2020 60601 0 0 4
18JAN2020 60601 0 0 6
19JAN2020 60601 0 0 4
20JAN2020 60601 0 0 6

data zips2 ; keep date zipcode pct count total n meanxi sumxi;
set zips1;
 if missing(count ) then
 do;
 OBS = 0;
 count = 0.0;
 end;
 else OBS = 1;
 XI7 = lag7(count );
 OBS7 = lag7(obs);
 if missing(xi7) then xi7 = 0.0;
 if missing(obs7) then obs7 = 0;
 LDATE = lag2(date);
 format ldate date9. ; 

 if _N_ = 1 then
 do;
 SUMXI = 0.0;
 N = 0;
 end;
 else;
 sumxi = sumxi + count - xi7;
 n = n + obs - obs7;
 MEANXI = sumxi / n ;
 retain sumxi n;
run;

 

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

What do you want the output to look like?  Since you are doing 7-day rolling statistics, do you want to start each zip code with the 7th observation, such that it is the first with a completely populated 7-day window?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
eramirez
Fluorite | Level 6

Hello

 

thanks for the reply.  The results can start with the 7th observation if that helps, the variable N will indicate when the 7th observation begins so when I overlay the MEANXI (line) values over the count (bar), I can use N>6 to exclude those first six values. 

 

Thanks

Enrique

Reeza
Super User

Here’s a quick example. 
https://gist.github.com/statgeek/27e23c015eae7953eff2

 

Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed. 


@eramirez wrote:

Hello

 

I found some sas code online that calculates a moving average.  I don't have SAS/ETS.  It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL.  I want to run the code through each group of zipcodes, calculating the new moving average for each.  Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it.  Thank you.

data zips1;
  infile datalines dsd truncover;
  input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;
  format Date DATE9.;
  datalines;
01JAN2020 60601 16.666667 1 6
02JAN2020 60601 0 0 8
03JAN2020 60601 14.285714 1 7
04JAN2020 60601 0 0 5
05JAN2020 60601 0 0 7
06JAN2020 60601 0 0 8
07JAN2020 60601 0 0 6
08JAN2020 60601 0 0 8
09JAN2020 60601 20 1 5
10JAN2020 60601 0 0 6
11JAN2020 60601 0 0 8
12JAN2020 60601 0 0 4
13JAN2020 60601 0 0 8
14JAN2020 60601 0 0 10
15JAN2020 60601 0 0 9
16JAN2020 60601 25 1 4
17JAN2020 60601 0 0 4
18JAN2020 60601 0 0 6
19JAN2020 60601 0 0 4
20JAN2020 60601 0 0 6

data zips2 ; keep date zipcode pct count total n meanxi sumxi;
set zips1;
 if missing(count ) then
 do;
 OBS = 0;
 count = 0.0;
 end;
 else OBS = 1;
 XI7 = lag7(count );
 OBS7 = lag7(obs);
 if missing(xi7) then xi7 = 0.0;
 if missing(obs7) then obs7 = 0;
 LDATE = lag2(date);
 format ldate date9. ; 

 if _N_ = 1 then
 do;
 SUMXI = 0.0;
 N = 0;
 end;
 else;
 sumxi = sumxi + count - xi7;
 n = n + obs - obs7;
 MEANXI = sumxi / n ;
 retain sumxi n;
run;

 

eramirez
Fluorite | Level 6

Thank you, this works also and useful for other stats!

 

 

mkeintz
PROC Star

In cases like this, I would suggest maintaining an array containing the most recent 7 values of the variables in question.  Here's an example getting the 7-day rolling mean of COUNT:

 

data zips1;
infile datalines truncover;
input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;
format Date DATE9.;
datalines;
01JAN2020 60601 16.666667 1 6
02JAN2020 60601 0 0 8
03JAN2020 60601 14.285714 1 7
04JAN2020 60601 0 0 5
05JAN2020 60601 0 0 7
06JAN2020 60601 0 0 8
07JAN2020 60601 0 0 6
08JAN2020 60601 0 0 8
09JAN2020 60601 20 1 5
10JAN2020 60601 0 0 6
11JAN2020 60601 0 0 8
12JAN2020 60601 0 0 4
13JAN2020 60601 0 0 8
14JAN2020 60601 0 0 10
15JAN2020 60601 0 0 9
16JAN2020 60601 25 1 4
17JAN2020 60601 0 0 4
18JAN2020 60601 0 0 6
19JAN2020 60601 0 0 4
20JAN2020 60601 0 0 6
run;
data zips2;
set zips1;
by zipcode;
obs+1;
if first.zipcode then obs=1;
array _cntarray {0:6} _temporary_;
_cntarray{mod(obs,7)}=count;
if obs>=7;
mean_count=mean(of _cntarray{*});
run;

Now, if you really want to maintain a rolling sum to which you add the current COUNT and subtract lag7(count), followed by division-by-7, you could do this:

data zips2;
  set zips1;
  by zipcode date ;
  obs+1;
  if first.zipcode then obs=1;
  sum_count + count + -coalesce(lag7(count),0);
  if obs>=7;
  mean_count=sum_count/7;
run;

The problem with the second approach is that, for long time series, you could accumulate some minor computational rounding errors, such that the end of the series might not excactly equal one-seventh of the sum of the last 7 obs.

 

OTOH, the second approach could be a bit faster, especially if you want, say, a 40-day rolling window instead of a 7-day window. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
eramirez
Fluorite | Level 6

Thank you so much! The first approach works great.  I potentially may have long time series data so thank you for the 2nd method.  I will keep them both handy.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1290 views
  • 2 likes
  • 3 in conversation