I used Proc Expand and averaged averages.
My colleague used a data step and divided moving numerator sums and denominator sums.
We got different answers due to different weighting.
Which is the correct way to calculate a moving average?
Both are valid approaches and you should get the same answers.
But if you're not using the same methodology then they won't be equal, which may be the actual issue.
If your data has weights and you want to factor those in I don't believe PROC EXPAND supports weights the same way.
You would need to specify the weights in the CONVERT statement. Numbers in purpler are the weights.
convert x=y / transformout=( cmovave( .1 .2 .4 .2 .1 ) );
We don't have enough information at this point to say one's correct over another without seeing any data or code.
@proctice wrote:
I used Proc Expand and averaged averages.
My colleague used a data step and divided moving numerator sums and denominator sums.
We got different answers due to different weighting.
Which is the correct way to calculate a moving average?
We don't use weights, but we wouldn't get the same answer if one time period had many more cases than another time period. The sum method would naturally give more weight to the time period with more cases and the average of averages method would give them all equal weight.
@proctice wrote:
We don't use weights, but we wouldn't get the same answer if one time period had many more cases than another time period. The sum method would naturally give more weight to the time period with more cases and the average of averages method would give them all equal weight.
We don't have enough information at this point to say one's correct over another without seeing any data or code.
Mean of time period | |||||||
time 1 | 30 | 60 | 90 | 30 | 60 | 50 | 53.33333 |
time 2 | 20 | 10 | 15 | ||||
time 3 | 5 | 5 | 3 | 2 | 1 | 3.2 | |
time 4 | 12 | 3 | 50 | 21.66667 | |||
|
mean of means | 23.3 | |||||
mean | 25.4 |
But imagine more time periods and rolling means of the time periods. mean of time 1-4, mean of time 2-5, mean of time 3-6, etc.
I calculate rolling mean using the mean of the means, colleague calculates using the mean of the values in the time periods.
Why would you have multiple measurements per time? Are they per person or some other thing? Some type of repeated measurement? Context does decide how it should be handled. For example, depending on what that 'across' variable is, maybe you should be averaging the columns first (or totaling) and then averaging afterwards.
As posted, no, the mean of means here is not what I would consider a good solution because you have multiple measurements per time.
So depending on how that's defined and the context you may want to summarize each time and then possibly add a weight.
If this is a moving average, you should have more than one value though, ie moving average implies last two values and carried forward or average of the three most recent times moving forward etc.
@proctice wrote:
Mean of time period time 1 30 60 90 30 60 50 53.33333 time 2 20 10 15 time 3 5 5 3 2 1 3.2 time 4 12 3 50 21.66667
mean of means 23.3 mean 25.4
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.