06-25-2015 05:13 PM
I'd appreciate if anyone can explain to me how SAS's PROC ARIMA calculates the Variance Estimate of the residual series? I have not been able to replicate it.
From the SAS documentation:
The "Variance Estimate" is the variance of the residual series
The results of my ARIMA fit produce the following:
|Std Error Estimate||0.011849|
|Number of Residuals||40|
I would like to replicate the 0.00014 Variance Estimate. To do so, I grab the 40 residuals from my ARIMA and run a variance calculation on them using query builder and it does not match. I dump the data in excel and neither a population or sample variance produce the same result. Can someone please explain to me what is happening with SAS's variance estimate?
Thank you. (Residuals have been listed below)
06-26-2015 01:57 PM
My guess is that these residuals were copied from some ODS output? The mean of these 40 numbers
is -0.000331, whereas true residuals would sum to zero (or numerically, to a smaller number such as 1e-16).
Create an output data set that contains the residuals and then use
PROC MEANS N var;
I suspect (hope) that you will get a variance that agrees with PROC ARIMA.
06-26-2015 02:07 PM
Thank you for your reply. I used query builder to attempt the VAR and MEAN calculations. The mean is -0.000330992 working directly from the actual ARIMA output and the var still does not equal the report from PROC ARIMA.
I've read up a bit more on ARIMA variance of residuals and believe this issue has more to do with the idea that a simple VAR of residuals is not appropriate for ARIMA models when there is autocorrelation in the original time-series. So what we have here is Conditional Variance apparently. So if possible, can someone please explain to me how to properly estimate the Conditional Variance of these residuals.....assuming that is what PROC ARIMA is doing?
06-29-2015 09:27 AM
Please don't consider this a smart-aleck answer, but the best way to calculate the variance of the residuals for an ARIMA process is to use PROC ARIMA.
I suppose that you could feed the raw data into PROC MIXED, and with appropriate coding of fixed effects, and much work with the covariance structures, get out an estimate.
I would not do it by hand, as there are too many matrix operations involved to get reasonable answers.
06-29-2015 10:42 AM
Thanks for the reply Steve. Regardless of the complexity, I would still like to understand how this variance estimate is performed to ensure I can thoroughly explain the methodology behind the forecasts I am producing. Are there a couple blogs or online papers someone could direct me to that provides an overview of the approach?
06-29-2015 01:07 PM
Have you searched through the NIST statistics handbook? It references Brockwell and Davis, 1991 Introrduction to Times Series and Forecasting, 2nd ed. as a source for the likelihood function used. I would say to go there, and be prepared for the matrix algebra, if you want to really understand how those parameters are estimated. I'll be honest--it would take me a week to begin to understand it, and I feel pretty comfortable with the matrix algebra involved with generalized linear mixed models.
Short answer: The value you are looking for is the square root of the variance component corresponding to the residual error for a non-linear optimization using the likelihood function for a Box-Jenkins process.
For a well illustrated longer answer, you might check out: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=19&cad=rja&uact=8&ved=0CFQQFjAIOAo&url...