Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- How do I run a do-loop by groups?

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-19-2020 06:59 PM
(1757 views)

Hello

I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.

data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here’s a quick example.

https://gist.github.com/statgeek/27e23c015eae7953eff2

Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed.

@eramirez wrote:

Hello

I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.

data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What do you want the output to look like? Since you are doing 7-day rolling statistics, do you want to start each zip code with the 7th observation, such that it is the first with a completely populated 7-day window?

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello

thanks for the reply. The results can start with the 7th observation if that helps, the variable N will indicate when the 7th observation begins so when I overlay the MEANXI (line) values over the count (bar), I can use N>6 to exclude those first six values.

Thanks

Enrique

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here’s a quick example.

https://gist.github.com/statgeek/27e23c015eae7953eff2

Change the min/max to mean/median or whatever stat you’re calculating and of course the array lengths as needed.

@eramirez wrote:

Hello

I found some sas code online that calculates a moving average. I don't have SAS/ETS. It works well, but I have some new data that is structured vertically. It consists of five variables, Date, Zipcode, PCT, COUNT, and TOTAL. I want to run the code through each group of zipcodes, calculating the new moving average for each. Once it encounters 60602, a new moving average is calculated. I have a feeling it's something simple but I can't think of it. Thank you.

data zips1; infile datalines dsd truncover; input Date:DATE9. zipcode:32. pct:32. count:32. total:32.; format Date DATE9.; datalines; 01JAN2020 60601 16.666667 1 6 02JAN2020 60601 0 0 8 03JAN2020 60601 14.285714 1 7 04JAN2020 60601 0 0 5 05JAN2020 60601 0 0 7 06JAN2020 60601 0 0 8 07JAN2020 60601 0 0 6 08JAN2020 60601 0 0 8 09JAN2020 60601 20 1 5 10JAN2020 60601 0 0 6 11JAN2020 60601 0 0 8 12JAN2020 60601 0 0 4 13JAN2020 60601 0 0 8 14JAN2020 60601 0 0 10 15JAN2020 60601 0 0 9 16JAN2020 60601 25 1 4 17JAN2020 60601 0 0 4 18JAN2020 60601 0 0 6 19JAN2020 60601 0 0 4 20JAN2020 60601 0 0 6 data zips2 ; keep date zipcode pct count total n meanxi sumxi; set zips1; if missing(count ) then do; OBS = 0; count = 0.0; end; else OBS = 1; XI7 = lag7(count ); OBS7 = lag7(obs); if missing(xi7) then xi7 = 0.0; if missing(obs7) then obs7 = 0; LDATE = lag2(date); format ldate date9. ; if _N_ = 1 then do; SUMXI = 0.0; N = 0; end; else; sumxi = sumxi + count - xi7; n = n + obs - obs7; MEANXI = sumxi / n ; retain sumxi n; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, this works also and useful for other stats!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In cases like this, I would suggest maintaining an array containing the most recent 7 values of the variables in question. Here's an example getting the 7-day rolling mean of COUNT:

data zips1;

infile datalines truncover;

input Date:DATE9. zipcode:32. pct:32. count:32. total:32.;

format Date DATE9.;

datalines;

01JAN2020 60601 16.666667 1 6

02JAN2020 60601 0 0 8

03JAN2020 60601 14.285714 1 7

04JAN2020 60601 0 0 5

05JAN2020 60601 0 0 7

06JAN2020 60601 0 0 8

07JAN2020 60601 0 0 6

08JAN2020 60601 0 0 8

09JAN2020 60601 20 1 5

10JAN2020 60601 0 0 6

11JAN2020 60601 0 0 8

12JAN2020 60601 0 0 4

13JAN2020 60601 0 0 8

14JAN2020 60601 0 0 10

15JAN2020 60601 0 0 9

16JAN2020 60601 25 1 4

17JAN2020 60601 0 0 4

18JAN2020 60601 0 0 6

19JAN2020 60601 0 0 4

20JAN2020 60601 0 0 6

run;

data zips2;

set zips1;

by zipcode;

obs+1;

if first.zipcode then obs=1;

array _cntarray {0:6} _temporary_;

_cntarray{mod(obs,7)}=count;

if obs>=7;

mean_count=mean(of _cntarray{*});

run;

Now, if you really want to maintain a rolling sum to which you add the current COUNT and subtract lag7(count), followed by division-by-7, you could do this:

```
data zips2;
set zips1;
by zipcode date ;
obs+1;
if first.zipcode then obs=1;
sum_count + count + -coalesce(lag7(count),0);
if obs>=7;
mean_count=sum_count/7;
run;
```

The problem with the second approach is that, for long time series, you could accumulate some minor computational rounding errors, such that the end of the series might not excactly equal one-seventh of the sum of the last 7 obs.

OTOH, the second approach could be a bit faster, especially if you want, say, a 40-day rolling window instead of a 7-day window.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.