BookmarkSubscribeRSS Feed
kybowma
Calcite | Level 5

My objective is to implement a model which was fit using the ARIMA procedure in SAS.  Via SAS technical support I was able to have them translate the models functional form (not the best at math) as follows:

 

yt = yt-1 + C +  w1x1t  + w2x2t  + ϕ ((yt-1 - yt-2) - w1x1t-1 - w2x2t-1 ) + at

 

The random error term at time t, at, takes on its expected value of 0 and effectively drops out of the equation.  In the forecast horizon, when actual values of the lags of y on the right-hand-side of the equation are no longer available, the corresponding predicted value is used in place of the lagged y values.

 

To test the above I create fake data and harvested the coefficients from the ARIMA procedures:

 

Capture.PNG

Where C=-15.59089, ϕ=-0.57206, w1=-0.02439 and w2=0.05201, the error term a(t) drops out and is essentially 0 as stated in the above.

 

Programmatically speaking, if I implement the above equation (if I implemented correctly) I should get the exact output that the ARIMA procedure (with forecast statement) produces.  Unfortunately, I am off by more than rounding errors.  The example data and manual scoring is in the following code, can someone please correct my mistakes so that I can match the ARIMA forecast output (FORECAST variable to imp_forecast variable).  My assumption that the model fit (Actuals data denoted by the "type" variable) is close, as the "arima_forecast_diff" is static during this time at -8.91.

*** UPDATE 1 ***

Given I was off by a constant for the fitted model (i.e. type='Actuals') I was able to determine the root cause of the difference and isolate to the following: If I add back -C*AR(1) (-1(-15.59089-1*-0.57206))=8.9189245334 which was the exact difference.  Still working on the forecasted difference.  So, the modeling equation becomes:

 

yt = yt-1 + C +  w1x1t  + w2x2t  + ϕ ((yt-1 - yt-2) - w1x1t-1 - w2x2t-1 ) + at  - C*ϕ

 

Which yields (not regarding rounding errors) the exact output from the FORECAST statement in the ARIMA procedure.

 

 

data arima_data;
    format date date9.;
    length type $ 8;
    do date='01JAN2000'd to intnx('month',today(),-12,'B');
        date=intnx('month',date,0,'B');
        type='Actual';
        y=ranuni(28269)*1000;
        x1=ranuni(28123)*1000;
        x2=ranuni(28722)*1000;
        output;
        date=intnx('month',date,0,'E');
    end;
    /* Out of time. */
    do date=date to today();
        date=intnx('month',date,0,'B');
        type='Forecast';
        y=.;
        x1=ranuni(28123)*1000;
        x2=ranuni(28722)*1000;
        output;
        date=intnx('month',date,0,'E');
    end;
run;

proc arima data=arima_data;
    /* Difference model with 2 x terms */;
    identify var=y(1) crosscorr=(x1 x2) noprint;
    /* Including 1 lag. */
    estimate input=(x1 x2) p=1 noest noprint
        ar=-0.57206
        mu=-15.59089
        initval=(-0.02439 x1 0.05201 x2)
    ;
    forecast interval=month id=date out=arima_forecast lead=12 noprint;
run;quit;

data merge_arima_forecast;
    merge arima_data (in=mas)
          arima_forecast (in=fcst keep=date forecast residual)
    ;
    by date;
run;

data implementation;
    set merge_arima_forecast;
    by date;

    /* From SAS Tech Support:
    y(t)=y(t-1) + C + beta1*x1(t) + beta2*x2(t) + AR(1)((y(t-1) - y(t-2)) - beta1*x1(t-1) - beta2*x2(t-1)) + a(t)
    Assumed, C=Mu (-15.59089), AR(1)=Auto Regressive Parameter (-0.57206), beta1 (-0.02439) and beta2 (0.5201)
    a(t) is the error term which is essentially zero and drops out of the model.
    */

    y_lag=lag(y);
    y_lag2=lag2(y);
    x1_lag=lag(x1);
    x2_lag=lag(x2);

    c=-15.59089;
    ar=-0.57206;
    beta1=-0.02439;
    beta2=0.05201;

    /* Actuals */
    *if type='Actual' then imp_forecast=sum(y_lag,c,beta1*x1,beta2*x2,ar*((y_lag-y_lag2)-beta1*x1_lag-beta2*x2_lag));
    /* Updated with -c*ar, matches the ARIMA fit. */
    if type='Actual' then imp_forecast=sum(y_lag,c,beta1*x1,beta2*x2,ar*((y_lag-y_lag2)-beta1*x1_lag-beta2*x2_lag),-c*ar);
    /* Forecast */
    lag_imp_forecast=lag(imp_forecast);
    lag2_imp_forecast=lag2(imp_forecast);
    if type='Forecast' then imp_forecast=sum(lag_imp_forecast,c,beta1*x1,beta2*x2,ar*((lag_imp_forecast-coalesce(y_lag2,lag2_imp_forecast))-beta1*x1_lag-beta2*x2_lag),-c*ar);

    arima_forecast_diff=forecast-imp_forecast;

run;

 

 

 

1 REPLY 1
SAS_Vpt
Calcite | Level 5

Thanks for the updates, this really helped me out.

 

Do you know the formula for a ARIMA p=1 q=1 model? 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1414 views
  • 0 likes
  • 2 in conversation