BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RyDan
Fluorite | Level 6

Hello,

 

I am using the NOEST option in the ESTIMATE statement of the ARIMA procedure and it has lead me to an unexpected result.  I have fit a model to predict net charge off (Y) data with Unemployment rate (X) as the input, no ARMA terms and no intercept just to isolate the issue, and extracted the parameter estimates, say Num1 (Shift=0) = 0.5 and Num1,1 (Shift=0) = -0.6.

 

The strange thing is that the trend of Y dictates the forecasts, when i expected that the right hand side of the forecast equation is soley a function of X.

 

For example I have two data sets:

 

data test_y_increase;

input Annl_NCO_rate Unemployment_Rate_FB monthend_date;

cards;

1 14 1

1 13.1 2

2 12 3

2 11.5 4

4 10 5

4 9.9 6

4 8 7

5 7 8

6 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

data test_y_decrease;

input Annl_NCO_rate Unemployment_Rate_FB monthend_date;

cards;

6 14 1

6 13.1 2

5 12 3

4 11.5 4

3 10 5

2 9.9 6

2 8 7

1 7 8

1 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

 

In both datasets, the values of X are the same (decreasing).  Given my parameters for Concurrent and lag 1 of X, when I apply the published model to the series Y, it seems that the trend of Y has influence of the forecasts.  This is unexpected to me because the model I have applied is only a function of X. 

 

Here is my ARIMA code:

 

%let NumFactor1 = .5;

%let NumFactor2 = -.6;

 

 

proc arima data=test_y_increase;

title "Increasing Y";

identify var=Annl_NCO_rate(1) crosscorr=( Unemployment_Rate_FB(1) ) CLEAR CENTER;

estimate input =( (1)Unemployment_Rate_FB )

initval =( &NumFactor1.$(&NumFactor2.)Unemployment_Rate_FB

)

noest

NOINT;

forecast id=monthend_date BACK=0 lead=5 out=out_test_increase;

run;

quit;

 

proc arima data=test_y_decrease;

title "Decreasing Y";

identify var=Annl_NCO_rate(1) crosscorr=( Unemployment_Rate_FB(1) ) CLEAR CENTER;

estimate input =( (1)Unemployment_Rate_FB )

initval =( &NumFactor1.$(&NumFactor2.)Unemployment_Rate_FB

)

noest

NOINT;

forecast id=monthend_date BACK=0 lead=5 out=out_test_decrease;

run;

quit;

title;

 

Do you know why these forecasts are not only a function of X, or what the forecast equation is ?

 

Thanks,

Ryan

1 ACCEPTED SOLUTION

Accepted Solutions
RyDan
Fluorite | Level 6

Thank you to Kenneth Sanford for getting Donna Woodward's response:

 

Because the response variable for the two models has a first difference associated with it, the forecasts will be a function of the lag of the previous actual value of the response variable (when available) or the lag of the previous forecast when lagged actuals are no longer available.  If differencing had not been specified for the response variable in these two models, then the forecasts would, indeed, have only be a function of the input variable, X.

 

To illustrate how the lagged dependent variable is incorporated into the forecast equation when the response variable is differenced, let’s look at a simple random walk model:

 

   Proc arima;

       Identify var=y(1);

       Estimate noint;

   Run;

 

In backshift notation, this model is written as:  (1-B)y_t = a_t

 

Performing the backshift operation, we get:  y_t – y_t-1 = a_t

 

The forecast model for y_t is therefore:  y_t = y_t-1 + a_t.

 

View solution in original post

3 REPLIES 3
rselukar
SAS Employee
%let NumFactor1 = .5;

%let NumFactor2 = -.6;

data test_i;

input y x date;

dx = dif(x);

ldx = lag(dx);

lx = lag(x);

ly = lag(y);

cards;

1 14 1

1 13.1 2

2 12 3

2 11.5 4

4 10 5

4 9.9 6

4 8 7

5 7 8

6 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

data test_d;

input y x date;

dx = dif(x);

ldx = lag(dx);

lx = lag(x);

ly = lag(y);

cards;

6 14 1

6 13.1 2

5 12 3

4 11.5 4

3 10 5

2 9.9 6

2 8 7

1 7 8

1 6.2 9

. 5 10

. 4 11

. 3.3 12

. 2 13

. 1 14

;

run;

 

 

proc arima data=test_i plots=none;

title "Increasing Y";

identify var=y(1) crosscorr=( x(1) ) noprint CLEAR;* CENTER;

estimate input =( (1)x ) 

initval =( &NumFactor1.$(&NumFactor2.)x

)

noest

NOINT;

forecast id=date BACK=0 lead=5 out=out_test_increase printall;

run;

quit;

data test_i;

set test_i;

retain tmp 0;

if _n_ <= 2 then ldx = -0.9;

if _n_ <= 1 then dx = -0.9;

 tfInput = &NumFactor1.*dx - &NumFactor2.*ldx;

if ly ^= . then forecast = ly + tfInput;

else forecast = tmp + tfInput;

tmp = forecast;

run;

proc print data=test_i; 

var y ly tfInput forecast;

run;

 

proc arima data=test_d plots=none;

title "Decreasing Y";

identify var=y(1) crosscorr=( x(1) ) CLEAR;* CENTER;

estimate input =( (1)x ) 

initval =( &NumFactor1.$(&NumFactor2.)x

)

noest

NOINT;

forecast id=date BACK=0 lead=5 out=out_test_decrease printall;

run;

quit;

 

data test_d;

set test_d;

retain tmp 0;

if _n_ <= 2 then ldx = -0.9;

if _n_ <= 1 then dx = -0.9;

 tfInput = &NumFactor1.*dx - &NumFactor2.*ldx;

if ly ^= . then forecast = ly + tfInput;

else forecast = tmp + tfInput;

tmp = forecast;

run;

proc print data=test_d; 

var y ly tfInput forecast;

run;

I am not quite sure I understand your question but here is what I make of it:

 

I am ignoring the CENTER option in your ARIMA code for simplicity.  Your model spec is:

 

identify var=y(1) crosscorr=( x(1) );

estimate input =( (1)x) initval =( &NumFactor1.$(&NumFactor2.)x) noest NOINT;

The forecast function for this is:

tfInput = NumFactor1*dif(x) - NumFactor2*lag(dif(x)).

forecast = lag(y) + tfInput when lag(y) is available

         = lag(forecast) + tfInput.

This does depend on y (and not just on x).

*************Verification code attached***************;

 

RyDan
Fluorite | Level 6
Thanks ! A quick question, what if I had an AR term, say p=1 where AR = -.3 . My Estimate statement now looks like this :
proc arima data=test_i plots=none;
title "Increasing Y - ar";
identify var=y(1) crosscorr=( x(1) ) noprint CLEAR;* CENTER;
estimate p=1 input =( (1)x ) ar=&ar.
initval =( &NumFactor1.$(&NumFactor2.)x)
noest
NOINT;
forecast id=date BACK=0 lead=5 out=out_test_increase_ar printall;
run;
quit;

How would you code this using the logic in testi ?
RyDan
Fluorite | Level 6

Thank you to Kenneth Sanford for getting Donna Woodward's response:

 

Because the response variable for the two models has a first difference associated with it, the forecasts will be a function of the lag of the previous actual value of the response variable (when available) or the lag of the previous forecast when lagged actuals are no longer available.  If differencing had not been specified for the response variable in these two models, then the forecasts would, indeed, have only be a function of the input variable, X.

 

To illustrate how the lagged dependent variable is incorporated into the forecast equation when the response variable is differenced, let’s look at a simple random walk model:

 

   Proc arima;

       Identify var=y(1);

       Estimate noint;

   Run;

 

In backshift notation, this model is written as:  (1-B)y_t = a_t

 

Performing the backshift operation, we get:  y_t – y_t-1 = a_t

 

The forecast model for y_t is therefore:  y_t = y_t-1 + a_t.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2924 views
  • 0 likes
  • 2 in conversation