BookmarkSubscribeRSS Feed
jaweriahh
Obsidian | Level 7

I am using the Unobserved Components Model on two variables with 55 observations each. I have two objectives. The first is to decompose the data and analyze the individual components the second is to forecast. As my data is annual I use the trend cycle model with dummy variables for structural breaks and outliers.

I am facing two problems.

  1. To check the fit of the model what diagnostics should I use. The R squareof my model is negative and the Adjusted R square is missing
  2. I know that I can plot the graphs of (the one-step-ahead residuals, residual histogram and the Q-Q plot, autocorrelation function and the partial autocorrelation function). But how can I calculate the:
  • Predicted error variance
  • the one step ahead prediction errors
  • the normality statistic based on the third and fourth moments
  • heteroscedasticity statistic based on the ratio of the sample for the first one third of the prediction errors
  • the Box-Ljung serial correlation statistic
  • Durbin-Watson
  • Skewness kurtosis

I am attaching my data and mentioning the code that i am using. Kindly help me solve this issue.

Aluminium 

proc ucm data = metals;
id year interval = year;
model al=break1973 outlier1979 outlier1994 outlier2008;
irregular;
level ;
slope;
cycle;
cycle;
estimate plot=all;
run;

proc ucm data = metals;
id year interval = year;
model al=break1973 outlier1979 outlier1994 outlier2008;
irregular;
level variance=0 noest;
cycle;
cycle;
estimate plot=all;
run;

ZI

proc ucm data = metals;
id year interval = year;
model zi=break1973 outlier1988 outlier1995 outlier2006 outlier2008;
irregular;
level;
slope;
cycle;
cycle;
estimate plot=all;
run;

proc ucm data = metals;
id year interval = year;
model zi=break1973 outlier1988 outlier1995 outlier2006 outlier2008;
irregular;
level variance=0 noest;
slope;
cycle;
cycle;
estimate plot=all;
run;

 

8 REPLIES 8
rselukar
SAS Employee

In this post I will try to address your UCM questions so far.  First some general comments:

Modeling and forecasting a time series is not easy without some understanding of the series being modeled.  Very often several models can be proposed that appear to to fit the historical data reasonably well (his is true of both ARIMA models and UCMs).  Model diagnostics (such as residual analysis) is useful but still requires context to decide which of the discovered features of the model are real and which might not be so.  Cross-validation type methods, which are very effective in addressing overfitting in the ordinary regression modeling are not as effective in the time series setting.  The policy about the handling of the outliers discovered during the exploratory stage is also not quite clear cut and (again) requires context info.  In light of these, my personal preference is to try simple models that fit the data reasonably well and not to try to overfit the historical region.  Outliers are left unhandled unless they distort the main features (such as trend) of the series.  Without additional context, the model given at the end of this post seems adequate to me.  Of course, whether the discovered cycle (of period 13 years) is "real" or not cannot be answered without domain info.  Now answers to your specific questions:

 

1.  Negative R-square:  The R-square in usual ordinary regression is based on "regression residuals" (Y - X beta-hat).  The UCM R-square is based on "one-step-ahead" residuals.  One-step-ahead residual at a particular time is based on data prior to that time point.  Therefore, UCM R-square is not guaranteed to be non-negative (this is mentioned in the UCM doc).  Moreover, when the UCM model contains dummy regressors, very often only a few non-missing one-step-ahead residuals are available for residual analysis.  This is because non-missing residuals are availble only after adequate number of observations are processed to initialize the diffuse components (which include regressors) in the model.  All of your models suffer from this condition of inadequate number of non-missing residuals for residual analysis.

2.  You can use the OUTFOR= option in the FORECAST statement to output series forecasts, residuals (their standard errors) and many other things.  UCM provides rich graphical support for residual analysis (as you have noticed).  If you want to compute some of the statistics you mention by hand, you can use the OUTFOR data set and use PROC IML or PROC UNIVARIATE.

 

My suggested program:

 

proc ucm data=metals;

model ZI;

irregular;

level variance=0 noest checkbreak;

slope;

cycle plot=smooth;

estimate plot=panel;

forecast plot=decomp;

run;

jaweriahh
Obsidian | Level 7

Thank you sir your reply and pointing out the above important points. It, and references (especially Harvey 1989 chapter 5 page 268 and 1992), helped me to clarify my doubts I. As you pointed out, I will not try to overfit the model and drop the outliers.

However sir I still have one issue. I want to incorporate the 1973 structural break to show its impact on the data (real metal prices which were distorted due to the oil price shock). As you mentioned the theory also supports this inclusion. Could you suggest how I can include and show the structural break in the model? Thank you once again.

rselukar
SAS Employee
Just include a level-shift variable that is zero before the event and 1 at and after the event in the input data set. Use this variable as a regressor. See the Nile level break detection example in the UCM doc.
jaweriahh
Obsidian | Level 7

I try the program (below) and then add the intervention (below). However my r square reduces (instead of improving) while model selection criteria, AIC and BIC criteria improve and the break is significant.

So is the model without the intervention better than the one with the intervention? Is the r square deteriorating as my data set is small (55 points).

 

proc ucm data=metals;

model al;

irregular;

level variance=0 noest checkbreak;

slope;

cycle plot=smooth;

estimate plot=panel;

forecast plot=decomp;

run;

 

proc ucm data=metals;

model al=break1973;

irregular;

level variance=0 noest checkbreak;

slope;

cycle plot=smooth;

estimate plot=panel;

forecast plot=decomp;

run;

jaweriahh
Obsidian | Level 7

When using the Unobserved Components Model  is it possible to include an independent vaiable and generate out of sample forecasts?

I am using the UCM but whenever I include an independent variable (along with a fixed level stochastic slope and cycle) i get an error and the results do not contain out of sample forecasts.

MTM1
Fluorite | Level 6

Hi!!

 

did you solve the problem?? i had the same problem, when i introduce intervention variables r squares reduces considerably.

 

 

regards,

rselukar
SAS Employee

In UCM regression coefficients are part of the state vector.  They make up the section of the model state vector that has diffuse prior.  Diffuse Kalman filter recursively computes one-step-ahead forecasts of the model state and response values.  This recursive process also produces the one-step-ahead residuals.  The one-step-ahead forecasts and residuals are set to missing until enough observations are processed so that all the diffuse state elements can be estimated.  This scenario is similar to recursive estimation of regression vector in ordinary regression setting where the observation are processed one-at-a-time in a sequential fashion.  In this setting one must first process sufficient number of observations so that the resulting design matrix is invertible before one can produce a valid estimate of the regression vector.  When you specify an ntervention variable in such a setting, say the intervention is at 10th observation, i.e., the variable is zero for the first nine observations and is 1 thereafter, you must process at least 10 observations for the design matrix to become invertible.  Because of this recursive nature of diffuse state estimation, the number of residuals available to compute residuals based fit statistics (such as RSuare) can become quite small when intervention variables are introduced as regressors in a UCM model.  UCM one-step-ahead residuals are not the same as regression residuals (i.e. PROC REG residuals).  For UCM models RSquare statistic need not increase because one adds a regressor in the model (in fact, in many time series model settings, including ARIMA and UCM, RSquare can even be negative).

 

Whether adding intervention improves the model can be determined based on a variety of other considerations: first, is the regression coefficient significant, have information criteria such as AIC, BIC improved, and so on. 

MTM1
Fluorite | Level 6
Really great.



Thank you very much





##- Please type your reply above this line. Simple formatting, no
attachments. -##

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1908 views
  • 2 likes
  • 3 in conversation