BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
agoldma
Pyrite | Level 9

I would like to calculate a moving average and standard errors across 6 previous time periods (for each period).

Total number of periods = 50; starting the calculations on the 7th period.

I can do this in a non-statistical way, but I was wondering if it's possible to do this with PROC MODEL or PROC MIXED or another procedure.

I have many observations per period (thousands), but I have a field called period, so I'd like to use that somehow.

I would like to avoid taking averages of averages, so I don't want to reduce my data to 1 observation per period.

If I have 2 numeric variables: X and period, how can I generate a 6-period moving average over 50 periods?

I would also like to get moving standard errors in addition to that average.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Here's a macro solution with proc means.

 

proc sql noprint;
select min(period) into :min_period TRIMMED
from have;
select max(period) into :max_period TRIMMED
from have;
quit;

%put &min_period;
%put &max_period;

options nomprint nosymbolgen;

%macro moving_summary(datain=, window=, dataout=);

*remove previous summary table;
proc sql;
drop table &dataout;
quit;

%do i=&min_period. %to %eval(&max_period.-&window.);
proc means data=&datain noprint;
var x;
where period between &i and %eval(&i. + &window. - 1);
output out=_temp mean=mean_x std=std_x stderr=stderr_x;
run;

data _temp;
set _temp;
start = &i;
end = &i + &window - 1;
run;

proc append base=&dataout data=_temp;
run;

proc sql;
drop table _temp;
quit;

%end;
%mend;

%moving_summary(datain=have, window=6, dataout=summary_stats);

View solution in original post

20 REPLIES 20
ballardw
Super User

You may want to provide a couple cases of data (with a few records per period not 1000's) of what the start and end result of this process may be.

 

Do you have access to SAS/ETS? That module has a number of timeseries related procedures for working with time structures and may  help.

agoldma
Pyrite | Level 9

I have SAS/ETS

The data looks like this

period    X

1         3

1         7

1         1

1         3

1         9

1         0

2         4

2         5

2         6

3         1

4         8

4         9

 . . .

50         0

50         3

50         7

50         5

There are multiple observations of X per period.

I would like to calculate the mean and standard error of X during periods 1-6; then during periods 2-7; then during periods 3-8 … periods 45-50

I can do this with the data step, proc means, proc sql… macro.

If you think it would it be easier to use PROC MODEL or another statistical procedure, can you please suggest a procedure along with statements and parameters

PeterClemmensen
Tourmaline | Level 20

Show us some data to get a code answer 🙂 PROC EXPAND is probably your best option. Also there are numerous articles on the subject

 

http://www2.sas.com/proceedings/forum2008/093-2008.pdf

 

http://sasnrd.com/moving-average/

 

http://blogs.sas.com/content/iml/2016/01/27/moving-average-in-sas.html

 

agoldma
Pyrite | Level 9

PROC EXPAND was the first thing I considered, but it doesn't allow multiple observations per period. I can summarize my data to one average-value per period, but I don't want to take averages of averages. Also, PROC EXPAND doesn't give me standard error. I need standard error in addition to the average.

I'll go with the non-statistical methods for now.

One of draycut's links pointed to a very useful SQL suggestion by the SAS Yoda himself: rolling standard deviation calculation. He uses subqueries in the select statement all the time (I need to do more of that)

 

Reeza
Super User

Days/periods with multiple observations are going to be weighted higher anyways if you don't summarize to a single metric per time period. Depending on subject matter this may or may not be ok but did want to flag it. 

 

Another option for rolling stats is the array method, SAS notes have an example of how that can work, you'll just need to keep track of days as well. 

agoldma
Pyrite | Level 9

Reeza, I'm sorry, but I can't access that github site -- my internet security is blocking it.

I see that I can make weights based on the number of observations per period

   transformout=( cmovave( .1 .2 .4 .2 .1 )

This should work for a moving average, but I don't think this will work for standard error because I need real record count to divide sdt.err. by sqrt(n)

If I just wanted a moving average, I could:

  1. summarize my data to 1 record per period
  2. count the number of records per period
  3. use these counts of records as weights in the transformout=( cmovave(…) statement

Although… this would probably be a manual process for every period because my weights would change with every set of 6 periods (out of 50 periods)

Reeza
Super User

I can't help you with this until you have the logic confirmed. 
That's a logical decision that you should be making based on business knowledge, requirements and statistical soundness. 

If you know the rules please post a fully worked example for at least one period. 

You haven't posted sample data so I'm not sure what you're expecting besides general solutions. If you need more assistance you need to post more information, including sample data and expected output.

 

 

http://support.sas.com/kb/41/380.html

agoldma
Pyrite | Level 9

I attached the full data to this post. It's practically the same as in my 9/2/2017 post above.

It's in an Excel spreadsheet -- to show the calculations on the 2nd sheet (data is on the 1st sheet)

I'm interested in the Mean and Standard Error -- both are moving 1 period at a time in groups of 6 periods.

 

I looked at Sample 41380, but it assumes a constant quantity of observations per period (12 in that example). I couldn't modify it to work for a variable quantity of observations per period.

Reeza
Super User

I'm not downloading an excel sheet, few people on public forums will. The other common approach is the SQL approach you mentioned already, that doesn't work? It won't do the weights though so you'd have to use another way if you are weighting. 

agoldma
Pyrite | Level 9

Here's a CSV version of that Excel file (really 2 files: data and summary)

I'm mostly done with a non-statistical solution -- using a data step, proc means, transpose, sql, and a macro.

Hoping for a statistical solution -- using a statistical procedure -- which would be more streamlined than looping with a macro.

My macro is not completely working yet, but as soon as I get it working, I'll post it here.

Reeza
Super User

@agoldma wrote:

 

I'm mostly done with a non-statistical solution -- using a data step, proc means, transpose, sql, and a macro.

Hoping for a statistical solution -- using a statistical procedure -- which would be more streamlined than looping with a macro.

 


I'm not sure where you've gotten your terminology from, but it's uncommon and inaccurate IMO. Trying to group procedures into 'statistical' and 'non-statistical' is not a real definition. PROC EXPAND is under SAS/ETS vs SAS/STAT so is it not a statistical procedure? PROC SQL is clearly a proc, but it can do a hell of a lot...you can probably adapt this to handle weights if you clearly define the rules.

 

If you don't want to summarize to a single record per event, I'm not sure there's a proc that will do what you want.  Note how I created your 'sample data have'. In future posts it's helpful if you do that step so we can copy and paste it directly into a browser. 

 

EDIT/NOTE: The code below is incorrect, it does a many to many merge, rather than an inline selection. The mean ends up ok, mostly by chance but the stderr and std will be incorrect because the N is incorrect. 

 

data have;
    informat period x 8.;
    infile cards dlm='09'x;
    input period X;
    cards;
1   3
1   7
1   1
1   3
1   9
1   0
2   4
2   5
2   6
3   1
4   8
4   9
5   9
5   6
5   6
5   3
5   3
6   8
6   9
6   8
6   9
6   8
6   3
6   6
6   7
6   3
7   3
7   4
7   9
7   5
7   6
7   7
7   4
7   3
7   0
8   2
8   3
8   3
8   4
8   8
8   7
8   4
8   4
8   7
9   2
9   2
9   8
10  7
10  3
10  8
10  3
10  5
11  5
12  9
12  2
12  2
12  3
12  1
12  4
13  7
13  3
13  6
13  7
14  0
14  1
14  6
14  4
14  3
14  9
14  5
14  5
15  7
15  6
15  5
15  8
15  3
15  7
15  3
15  4
16  0
16  3
16  8
16  8
16  9
16  7
16  4
16  2
17  1
17  6
17  9
17  2
17  5
18  4
19  1
19  2
19  7
19  7
19  6
20  2
20  9
21  0
21  6
21  5
21  8
22  9
22  8
22  1
22  6
22  4
23  0
23  5
23  2
24  4
24  7
24  6
24  4
24  6
24  8
25  6
25  5
25  7
25  5
26  1
26  2
26  8
26  2
26  6
26  5
27  9
27  1
27  8
27  3
27  4
28  4
28  8
28  8
28  0
28  8
28  5
28  9
29  1
29  4
29  3
29  3
29  3
29  8
29  6
30  0
30  7
30  9
30  1
30  3
30  0
30  3
31  5
31  3
31  3
31  8
31  1
31  7
31  6
32  5
32  4
32  0
32  6
32  3
32  7
33  6
33  3
33  3
33  7
33  3
34  8
34  3
35  1
35  6
35  4
35  2
35  8
35  2
36  5
36  5
36  4
36  2
36  1
36  1
37  6
37  8
37  9
37  2
37  6
38  8
38  6
38  1
38  7
38  4
38  2
38  2
38  2
38  2
38  8
38  5
38  3
38  6
39  6
39  4
39  4
39  5
39  0
39  0
39  4
39  7
39  5
40  2
41  8
41  2
41  1
41  7
41  2
42  7
42  4
42  6
43  4
43  4
44  7
44  0
44  6
44  3
44  3
44  5
44  1
44  3
44  9
44  6
44  7
45  0
45  3
45  1
45  5
45  3
45  7
45  8
45  1
45  2
45  7
46  2
46  1
46  3
46  0
46  1
46  7
46  2
46  4
46  6
47  9
47  9
47  7
47  0
48  2
48  0
48  4
48  9
48  1
48  2
48  1
48  0
48  4
48  1
48  8
48  4
48  4
49  6
49  7
49  2
49  5
49  7
49  1
49  0
49  2
49  7
49  5
49  1
49  0
49  2
49  3
49  6
49  5
49  3
49  7
49  7
49  9
49  5
49  4
49  0
49  1
49  6
49  7
49  1
49  3
49  5
49  0
49  8
49  7
49  3
50  0
50  3
50  7
50  5
;
run;

proc sql ;
    create table want as 
    select h1.period as start_period, 
            h1.period+5 as end_period, 
            mean(h2.x) as avg_value, 
            std(h2.x) as std_value
        /*, insert other stats as needed, except median/percentiles*/
        from have as h1 
        left join have as h2 
        on h2.period between h1.period and h1.period+5 
        group by start_period, end_period;
quit;

 

Note: Link removed.

agoldma
Pyrite | Level 9

Thank you for this very efficient solution.

The mean looks good, but the standard deviation is off a little, and standard error is off a lot.

If you look only at the periods 1-6 (first 26 observations)

Mean = 5.53846154    this is perfect

Standard Deviation (STD) = 2.79950368

  • I expected 2.845779601
  • PROC MEANS agrees with PROC SQL and with Excel on 2.845779

Standard Error (STDERR) = 0.22413968

  • this is less than half of what what I expected
  • PROC MEANS agrees with PROC SQL and with Excel on 0.5581033

I used these 2 procedures to check the standard deviation and standard error 

proc means data=have  mean std stderr maxdec=8 ;

   where period between 1 and 6 ;

   var X ;

run;

proc sql;

   select avg(X)    as Mean

         ,std(X)    as STD     format=10.8

         ,stderr(X) as STDERR  format=10.8

   from   have

   where  period between 1 and 6

;quit;

 

Reeza
Super User

Check the documentation to see what formulas are used in SQL for Standard deviation, the denominator can vary based on your definition.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 4165 views
  • 7 likes
  • 4 in conversation