I would like to calculate a moving average and standard errors across 6 previous time periods (for each period).
Total number of periods = 50; starting the calculations on the 7th period.
I can do this in a non-statistical way, but I was wondering if it's possible to do this with PROC MODEL or PROC MIXED or another procedure.
I have many observations per period (thousands), but I have a field called period, so I'd like to use that somehow.
I would like to avoid taking averages of averages, so I don't want to reduce my data to 1 observation per period.
If I have 2 numeric variables: X and period, how can I generate a 6-period moving average over 50 periods?
I would also like to get moving standard errors in addition to that average.
Thank you
Here's a macro solution with proc means.
proc sql noprint;
select min(period) into :min_period TRIMMED
from have;
select max(period) into :max_period TRIMMED
from have;
quit;
%put &min_period;
%put &max_period;
options nomprint nosymbolgen;
%macro moving_summary(datain=, window=, dataout=);
*remove previous summary table;
proc sql;
drop table &dataout;
quit;
%do i=&min_period. %to %eval(&max_period.-&window.);
proc means data=&datain noprint;
var x;
where period between &i and %eval(&i. + &window. - 1);
output out=_temp mean=mean_x std=std_x stderr=stderr_x;
run;
data _temp;
set _temp;
start = &i;
end = &i + &window - 1;
run;
proc append base=&dataout data=_temp;
run;
proc sql;
drop table _temp;
quit;
%end;
%mend;
%moving_summary(datain=have, window=6, dataout=summary_stats);
You may want to provide a couple cases of data (with a few records per period not 1000's) of what the start and end result of this process may be.
Do you have access to SAS/ETS? That module has a number of timeseries related procedures for working with time structures and may help.
I have SAS/ETS
The data looks like this
period X
1 3
1 7
1 1
1 3
1 9
1 0
2 4
2 5
2 6
3 1
4 8
4 9
. . .
50 0
50 3
50 7
50 5
There are multiple observations of X per period.
I would like to calculate the mean and standard error of X during periods 1-6; then during periods 2-7; then during periods 3-8 … periods 45-50
I can do this with the data step, proc means, proc sql… macro.
If you think it would it be easier to use PROC MODEL or another statistical procedure, can you please suggest a procedure along with statements and parameters
PROC EXPAND is your best option.
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
Show us some data to get a code answer 🙂 PROC EXPAND is probably your best option. Also there are numerous articles on the subject
http://www2.sas.com/proceedings/forum2008/093-2008.pdf
http://sasnrd.com/moving-average/
http://blogs.sas.com/content/iml/2016/01/27/moving-average-in-sas.html
PROC EXPAND was the first thing I considered, but it doesn't allow multiple observations per period. I can summarize my data to one average-value per period, but I don't want to take averages of averages. Also, PROC EXPAND doesn't give me standard error. I need standard error in addition to the average.
I'll go with the non-statistical methods for now.
One of draycut's links pointed to a very useful SQL suggestion by the SAS Yoda himself: rolling standard deviation calculation. He uses subqueries in the select statement all the time (I need to do more of that)
Days/periods with multiple observations are going to be weighted higher anyways if you don't summarize to a single metric per time period. Depending on subject matter this may or may not be ok but did want to flag it.
Another option for rolling stats is the array method, SAS notes have an example of how that can work, you'll just need to keep track of days as well.
Reeza, I'm sorry, but I can't access that github site -- my internet security is blocking it.
I see that I can make weights based on the number of observations per period
transformout=( cmovave( .1 .2 .4 .2 .1 )
This should work for a moving average, but I don't think this will work for standard error because I need real record count to divide sdt.err. by sqrt(n)
If I just wanted a moving average, I could:
Although… this would probably be a manual process for every period because my weights would change with every set of 6 periods (out of 50 periods)
I can't help you with this until you have the logic confirmed.
That's a logical decision that you should be making based on business knowledge, requirements and statistical soundness.
If you know the rules please post a fully worked example for at least one period.
You haven't posted sample data so I'm not sure what you're expecting besides general solutions. If you need more assistance you need to post more information, including sample data and expected output.
I attached the full data to this post. It's practically the same as in my 9/2/2017 post above.
It's in an Excel spreadsheet -- to show the calculations on the 2nd sheet (data is on the 1st sheet)
I'm interested in the Mean and Standard Error -- both are moving 1 period at a time in groups of 6 periods.
I looked at Sample 41380, but it assumes a constant quantity of observations per period (12 in that example). I couldn't modify it to work for a variable quantity of observations per period.
I'm not downloading an excel sheet, few people on public forums will. The other common approach is the SQL approach you mentioned already, that doesn't work? It won't do the weights though so you'd have to use another way if you are weighting.
Here's a CSV version of that Excel file (really 2 files: data and summary)
I'm mostly done with a non-statistical solution -- using a data step, proc means, transpose, sql, and a macro.
Hoping for a statistical solution -- using a statistical procedure -- which would be more streamlined than looping with a macro.
My macro is not completely working yet, but as soon as I get it working, I'll post it here.
@agoldma wrote:
I'm mostly done with a non-statistical solution -- using a data step, proc means, transpose, sql, and a macro.
Hoping for a statistical solution -- using a statistical procedure -- which would be more streamlined than looping with a macro.
I'm not sure where you've gotten your terminology from, but it's uncommon and inaccurate IMO. Trying to group procedures into 'statistical' and 'non-statistical' is not a real definition. PROC EXPAND is under SAS/ETS vs SAS/STAT so is it not a statistical procedure? PROC SQL is clearly a proc, but it can do a hell of a lot...you can probably adapt this to handle weights if you clearly define the rules.
If you don't want to summarize to a single record per event, I'm not sure there's a proc that will do what you want. Note how I created your 'sample data have'. In future posts it's helpful if you do that step so we can copy and paste it directly into a browser.
EDIT/NOTE: The code below is incorrect, it does a many to many merge, rather than an inline selection. The mean ends up ok, mostly by chance but the stderr and std will be incorrect because the N is incorrect.
data have;
informat period x 8.;
infile cards dlm='09'x;
input period X;
cards;
1 3
1 7
1 1
1 3
1 9
1 0
2 4
2 5
2 6
3 1
4 8
4 9
5 9
5 6
5 6
5 3
5 3
6 8
6 9
6 8
6 9
6 8
6 3
6 6
6 7
6 3
7 3
7 4
7 9
7 5
7 6
7 7
7 4
7 3
7 0
8 2
8 3
8 3
8 4
8 8
8 7
8 4
8 4
8 7
9 2
9 2
9 8
10 7
10 3
10 8
10 3
10 5
11 5
12 9
12 2
12 2
12 3
12 1
12 4
13 7
13 3
13 6
13 7
14 0
14 1
14 6
14 4
14 3
14 9
14 5
14 5
15 7
15 6
15 5
15 8
15 3
15 7
15 3
15 4
16 0
16 3
16 8
16 8
16 9
16 7
16 4
16 2
17 1
17 6
17 9
17 2
17 5
18 4
19 1
19 2
19 7
19 7
19 6
20 2
20 9
21 0
21 6
21 5
21 8
22 9
22 8
22 1
22 6
22 4
23 0
23 5
23 2
24 4
24 7
24 6
24 4
24 6
24 8
25 6
25 5
25 7
25 5
26 1
26 2
26 8
26 2
26 6
26 5
27 9
27 1
27 8
27 3
27 4
28 4
28 8
28 8
28 0
28 8
28 5
28 9
29 1
29 4
29 3
29 3
29 3
29 8
29 6
30 0
30 7
30 9
30 1
30 3
30 0
30 3
31 5
31 3
31 3
31 8
31 1
31 7
31 6
32 5
32 4
32 0
32 6
32 3
32 7
33 6
33 3
33 3
33 7
33 3
34 8
34 3
35 1
35 6
35 4
35 2
35 8
35 2
36 5
36 5
36 4
36 2
36 1
36 1
37 6
37 8
37 9
37 2
37 6
38 8
38 6
38 1
38 7
38 4
38 2
38 2
38 2
38 2
38 8
38 5
38 3
38 6
39 6
39 4
39 4
39 5
39 0
39 0
39 4
39 7
39 5
40 2
41 8
41 2
41 1
41 7
41 2
42 7
42 4
42 6
43 4
43 4
44 7
44 0
44 6
44 3
44 3
44 5
44 1
44 3
44 9
44 6
44 7
45 0
45 3
45 1
45 5
45 3
45 7
45 8
45 1
45 2
45 7
46 2
46 1
46 3
46 0
46 1
46 7
46 2
46 4
46 6
47 9
47 9
47 7
47 0
48 2
48 0
48 4
48 9
48 1
48 2
48 1
48 0
48 4
48 1
48 8
48 4
48 4
49 6
49 7
49 2
49 5
49 7
49 1
49 0
49 2
49 7
49 5
49 1
49 0
49 2
49 3
49 6
49 5
49 3
49 7
49 7
49 9
49 5
49 4
49 0
49 1
49 6
49 7
49 1
49 3
49 5
49 0
49 8
49 7
49 3
50 0
50 3
50 7
50 5
;
run;
proc sql ;
create table want as
select h1.period as start_period,
h1.period+5 as end_period,
mean(h2.x) as avg_value,
std(h2.x) as std_value
/*, insert other stats as needed, except median/percentiles*/
from have as h1
left join have as h2
on h2.period between h1.period and h1.period+5
group by start_period, end_period;
quit;
Note: Link removed.
Thank you for this very efficient solution.
The mean looks good, but the standard deviation is off a little, and standard error is off a lot.
If you look only at the periods 1-6 (first 26 observations)
Mean = 5.53846154 this is perfect
Standard Deviation (STD) = 2.79950368
Standard Error (STDERR) = 0.22413968
I used these 2 procedures to check the standard deviation and standard error
proc means data=have mean std stderr maxdec=8 ;
where period between 1 and 6 ;
var X ;
run;
proc sql;
select avg(X) as Mean
,std(X) as STD format=10.8
,stderr(X) as STDERR format=10.8
from have
where period between 1 and 6
;quit;
Check the documentation to see what formulas are used in SQL for Standard deviation, the denominator can vary based on your definition.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.