Solved: forecasting equation in PROC ARIMA

RyDan · Posted 09-16-2015 02:30 PM

Hello,

I am using the NOEST option in the ESTIMATE statement of the ARIMA procedure and it has lead me to an unexpected result. I have fit a model to predict net charge off (Y) data with Unemployment rate (X) as the input, no ARMA terms and no intercept just to isolate the issue, and extracted the parameter estimates, say Num1 (Shift=0) = 0.5 and Num1,1 (Shift=0) = -0.6.

The strange thing is that the trend of Y dictates the forecasts, when i expected that the right hand side of the forecast equation is soley a function of X.

For example I have two data sets:

data test_y_increase;

input Annl_NCO_rate Unemployment_Rate_FB monthend_date;

cards;

1 14 1

1 13.1 2

2 12 3

2 11.5 4

4 10 5

4 9.9 6

4 8 7

5 7 8

6 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

data test_y_decrease;

input Annl_NCO_rate Unemployment_Rate_FB monthend_date;

cards;

6 14 1

6 13.1 2

5 12 3

4 11.5 4

3 10 5

2 9.9 6

2 8 7

1 7 8

1 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

In both datasets, the values of X are the same (decreasing). Given my parameters for Concurrent and lag 1 of X, when I apply the published model to the series Y, it seems that the trend of Y has influence of the forecasts. This is unexpected to me because the model I have applied is only a function of X.

Here is my ARIMA code:

%let NumFactor1 = .5;

%let NumFactor2 = -.6;

proc arima data=test_y_increase;

title "Increasing Y";

identify var=Annl_NCO_rate(1) crosscorr=( Unemployment_Rate_FB(1) ) CLEAR CENTER;

estimate input =( (1)Unemployment_Rate_FB )

initval =( &NumFactor1.$(&NumFactor2.)Unemployment_Rate_FB

)

noest

NOINT;

forecast id=monthend_date BACK=0 lead=5 out=out_test_increase;

run;

quit;

proc arima data=test_y_decrease;

title "Decreasing Y";

identify var=Annl_NCO_rate(1) crosscorr=( Unemployment_Rate_FB(1) ) CLEAR CENTER;

estimate input =( (1)Unemployment_Rate_FB )

initval =( &NumFactor1.$(&NumFactor2.)Unemployment_Rate_FB

)

noest

NOINT;

forecast id=monthend_date BACK=0 lead=5 out=out_test_decrease;

run;

quit;

title;

Do you know why these forecasts are not only a function of X, or what the forecast equation is ?

Thanks,

Ryan

RyDan · Posted 10-08-2015 03:12 PM

Thank you to Kenneth Sanford for getting Donna Woodward's response:

Because the response variable for the two models has a first difference associated with it, the forecasts will be a function of the lag of the previous actual value of the response variable (when available) or the lag of the previous forecast when lagged actuals are no longer available. If differencing had not been specified for the response variable in these two models, then the forecasts would, indeed, have only be a function of the input variable, X.

To illustrate how the lagged dependent variable is incorporated into the forecast equation when the response variable is differenced, let’s look at a simple random walk model:

Proc arima;

Identify var=y(1);

Estimate noint;

Run;

In backshift notation, this model is written as: (1-B)y_t = a_t

Performing the backshift operation, we get: y_t – y_t-1 = a_t

The forecast model for y_t is therefore: y_t = y_t-1 + a_t.

View solution in original post

rselukar · Posted 09-16-2015 05:52 PM

%let NumFactor1 = .5;

%let NumFactor2 = -.6;

data test_i;

input y x date;

dx = dif(x);

ldx = lag(dx);

lx = lag(x);

ly = lag(y);

cards;

1 14 1

1 13.1 2

2 12 3

2 11.5 4

4 10 5

4 9.9 6

4 8 7

5 7 8

6 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

data test_d;

input y x date;

dx = dif(x);

ldx = lag(dx);

lx = lag(x);

ly = lag(y);

cards;

6 14 1

6 13.1 2

5 12 3

4 11.5 4

3 10 5

2 9.9 6

2 8 7

1 7 8

1 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

 

　

proc arima data=test_i plots=none;

title "Increasing Y";

identify var=y(1) crosscorr=( x(1) ) noprint CLEAR;* CENTER;

estimate input =( (1)x ) 

initval =( &NumFactor1.$(&NumFactor2.)x

)

noest

NOINT;

forecast id=date BACK=0 lead=5 out=out_test_increase printall;

run;

quit;

data test_i;

set test_i;

retain tmp 0;

if _n_ <= 2 then ldx = -0.9;

if _n_ <= 1 then dx = -0.9;

 tfInput = &NumFactor1.*dx - &NumFactor2.*ldx;

if ly ^= . then forecast = ly + tfInput;

else forecast = tmp + tfInput;

tmp = forecast;

run;

proc print data=test_i; 

var y ly tfInput forecast;

run;

　

proc arima data=test_d plots=none;

title "Decreasing Y";

identify var=y(1) crosscorr=( x(1) ) CLEAR;* CENTER;

estimate input =( (1)x ) 

initval =( &NumFactor1.$(&NumFactor2.)x

)

noest

NOINT;

forecast id=date BACK=0 lead=5 out=out_test_decrease printall;

run;

quit;

　

data test_d;

set test_d;

retain tmp 0;

if _n_ <= 2 then ldx = -0.9;

if _n_ <= 1 then dx = -0.9;

 tfInput = &NumFactor1.*dx - &NumFactor2.*ldx;

if ly ^= . then forecast = ly + tfInput;

else forecast = tmp + tfInput;

tmp = forecast;

run;

proc print data=test_d; 

var y ly tfInput forecast;

run;

I am not quite sure I understand your question but here is what I make of it:

I am ignoring the CENTER option in your ARIMA code for simplicity. Your model spec is:

identify var=y(1) crosscorr=( x(1) );

estimate input =( (1)x) initval =( &NumFactor1.$(&NumFactor2.)x) noest NOINT;

The forecast function for this is:

tfInput = NumFactor1*dif(x) - NumFactor2*lag(dif(x)).

forecast = lag(y) + tfInput when lag(y) is available

= lag(forecast) + tfInput.

This does depend on y (and not just on x).

*************Verification code attached***************;

RyDan · Posted 10-08-2015 04:03 PM

Thanks ! A quick question, what if I had an AR term, say p=1 where AR = -.3 . My Estimate statement now looks like this :
proc arima data=test_i plots=none;
title "Increasing Y - ar";
identify var=y(1) crosscorr=( x(1) ) noprint CLEAR;* CENTER;
estimate p=1 input =( (1)x ) ar=&ar.
initval =( &NumFactor1.$(&NumFactor2.)x)
noest
NOINT;
forecast id=date BACK=0 lead=5 out=out_test_increase_ar printall;
run;
quit;

How would you code this using the logic in testi ?

RyDan · Posted 10-08-2015 03:12 PM

Thank you to Kenneth Sanford for getting Donna Woodward's response:

Because the response variable for the two models has a first difference associated with it, the forecasts will be a function of the lag of the previous actual value of the response variable (when available) or the lag of the previous forecast when lagged actuals are no longer available. If differencing had not been specified for the response variable in these two models, then the forecasts would, indeed, have only be a function of the input variable, X.

To illustrate how the lagged dependent variable is incorporated into the forecast equation when the response variable is differenced, let’s look at a simple random walk model:

Proc arima;

Identify var=y(1);

Estimate noint;

Run;

In backshift notation, this model is written as: (1-B)y_t = a_t

Performing the backshift operation, we get: y_t – y_t-1 = a_t

The forecast model for y_t is therefore: y_t = y_t-1 + a_t.

forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

Re: forecasting equation in PROC ARIMA

SAS Innovate 2025: Call for Content