09-04-2013 01:29 PM
I am currently in the process of adding confidence intervals to forecasts that are created using PROC ESM, using the various model options. I am trying to understand how SAS creates the standard deviations and confidence intervals and decide if I am comfortable using them as they come out of SAS, or if I should calculate my own or tweak them.
How are the CI's created? I assume this is by using the assumption of normality and the standard deviation. I tested this and found it to be approximately so on the data that is attached. (columns I and J, compared to columns E and F).
How are the SD's created? These have me very confused, for 3 reasons.
1) They are increasing over time (as we go out more periods in the future). I had never thought about it, but I suppose this makes sense- the forecast for 3 periods from now is going to have less information used in the forecast than the one 2 periods from now. But on what basis? I've included 2 series of data and the % change over time in the SD is markedly different between them. (Col L)
2) These are weekday seasonal forecasts so I assume that there are differences in SD by day. My second calculation (Col M) compares the change in week-over-week. The SD % Changes by day, and between series, are vast differences. (FYI: These 2 series are using different models available from PROC ESM, one that results in the 2nd week having the same predicted value, the other with different values for the 2nd week. I am not sure if this is important for this question or not).
3) The SD for period 1 is not at all similar to the SD for the actual data. Using SAS to calculate SD, I get 105,901 for Series1 (compared to 66,557 for period 1) and 10,363 for Series 2 (compared to 6,671). So the SD is smaller (for period 1, not necessarily all periods) for the forecast than the actual, which could be nice if it gives me a tighter CI based on lower volatility recently (which is the case in this data).
Why it's important:
I'm not necessarily looking for the most accurate CI, statistically speaking--I'd like to improve the CI algorithm over time perhaps but initially it is not extremely important. My audience is not accustomed to thinking probabilistically and by adding these CI's I want to make baby steps in that direction. My hope is that the variance of CI's both between series, and over time, will give them some insight as the non-homogeneity of the work we're doing in terms of volatility and predictability. I do not believe that CIs that grow dramatically from the beginning of the 2 week period to the end will be beneficial towards that end; rather, it will be confusing and is going to result in many more zeroes on the lower end. We cannot have negative work so they already implicitly understand that zero is a floor-- including it too much as a lower CI may lead them to lose confidence (ironically).
The sample output is from the outfor= data set from the PROC ESM statement, with a few irrelevant columns removed and my calculations added on the right.
NOTE: This question may only apply to PROC ESM, and not all SAS forecast procedures. We are transitioning into ESM but were using PROC FORECAST. It provides CIs in a very different manner. The CI, within one series, is the same % change down (Lower CI) and up (Upper CI) for every forecasted period. For series 1 of the example I attached, the CI is 63.5% in each direction from the predicted value for all 14 days. I think I may be amenable to one that changes over time (to reflect growing uncertainty, though not so much variance as I currently get; and, one that varies the SD by week day, though I do not know for sure if ESM is doing that). The PROC FORECAST algorithm is definitely not equal to the Predict +/- 2*SD method using the actual SD's I mentioned above. It somehow picks a % to adjust by, and then forces the SD/CI to adhere to that regardless of the predicted value or # of periods in advance.
09-06-2013 10:58 AM
I think the ESM documentation states that: "The techniques used in the ESM procedure are identical to those used for exponential smoothing models in the Time Series Forecasting". You will find a lot of information about how confidence intervals to forecasts that are created using PROC ESM here:
Also I would recommend to have a look at this White Paper: Large-Scale Automatic Forecasting Using Inputs and Calendar Events which gives some nice explanations of CI which could be beneficial for your audience as well.