My objective is to implement a model which was fit using the ARIMA procedure in SAS. Via SAS technical support I was able to have them translate the models functional form (not the best at math) as follows:
y t = y t-1 + C + w1x1 t + w2x2 t + ϕ ((y t-1 - y t-2 ) - w1x1 t-1 - w2x2 t-1 ) + a t
The random error term at time t, a t , takes on its expected value of 0 and effectively drops out of the equation. In the forecast horizon, when actual values of the lags of y on the right-hand-side of the equation are no longer available, the corresponding predicted value is used in place of the lagged y values.
To test the above I create fake data and harvested the coefficients from the ARIMA procedures:
Where C=-15.59089, ϕ=-0.57206, w1=-0.02439 and w2=0.05201, the error term a(t) drops out and is essentially 0 as stated in the above.
Programmatically speaking, if I implement the above equation (if I implemented correctly) I should get the exact output that the ARIMA procedure (with forecast statement) produces. Unfortunately, I am off by more than rounding errors. The example data and manual scoring is in the following code, can someone please correct my mistakes so that I can match the ARIMA forecast output (FORECAST variable to imp_forecast variable). My assumption that the model fit (Actuals data denoted by the "type" variable) is close, as the "arima_forecast_diff" is static during this time at -8.91.
*** UPDATE 1 ***
Given I was off by a constant for the fitted model (i.e. type='Actuals') I was able to determine the root cause of the difference and isolate to the following: If I add back -C*AR(1) (-1(-15.59089-1*-0.57206))=8.9189245334 which was the exact difference. Still working on the forecasted difference. So, the modeling equation becomes:
y t = y t-1 + C + w1x1 t + w2x2 t + ϕ ((y t-1 - y t-2 ) - w1x1 t-1 - w2x2 t-1 ) + a t - C*ϕ
Which yields (not regarding rounding errors) the exact output from the FORECAST statement in the ARIMA procedure.
data arima_data;
format date date9.;
length type $ 8;
do date='01JAN2000'd to intnx('month',today(),-12,'B');
date=intnx('month',date,0,'B');
type='Actual';
y=ranuni(28269)*1000;
x1=ranuni(28123)*1000;
x2=ranuni(28722)*1000;
output;
date=intnx('month',date,0,'E');
end;
/* Out of time. */
do date=date to today();
date=intnx('month',date,0,'B');
type='Forecast';
y=.;
x1=ranuni(28123)*1000;
x2=ranuni(28722)*1000;
output;
date=intnx('month',date,0,'E');
end;
run;
proc arima data=arima_data;
/* Difference model with 2 x terms */;
identify var=y(1) crosscorr=(x1 x2) noprint;
/* Including 1 lag. */
estimate input=(x1 x2) p=1 noest noprint
ar=-0.57206
mu=-15.59089
initval=(-0.02439 x1 0.05201 x2)
;
forecast interval=month id=date out=arima_forecast lead=12 noprint;
run;quit;
data merge_arima_forecast;
merge arima_data (in=mas)
arima_forecast (in=fcst keep=date forecast residual)
;
by date;
run;
data implementation;
set merge_arima_forecast;
by date;
/* From SAS Tech Support:
y(t)=y(t-1) + C + beta1*x1(t) + beta2*x2(t) + AR(1)((y(t-1) - y(t-2)) - beta1*x1(t-1) - beta2*x2(t-1)) + a(t)
Assumed, C=Mu (-15.59089), AR(1)=Auto Regressive Parameter (-0.57206), beta1 (-0.02439) and beta2 (0.5201)
a(t) is the error term which is essentially zero and drops out of the model.
*/
y_lag=lag(y);
y_lag2=lag2(y);
x1_lag=lag(x1);
x2_lag=lag(x2);
c=-15.59089;
ar=-0.57206;
beta1=-0.02439;
beta2=0.05201;
/* Actuals */
*if type='Actual' then imp_forecast=sum(y_lag,c,beta1*x1,beta2*x2,ar*((y_lag-y_lag2)-beta1*x1_lag-beta2*x2_lag));
/* Updated with -c*ar, matches the ARIMA fit. */
if type='Actual' then imp_forecast=sum(y_lag,c,beta1*x1,beta2*x2,ar*((y_lag-y_lag2)-beta1*x1_lag-beta2*x2_lag),-c*ar);
/* Forecast */
lag_imp_forecast=lag(imp_forecast);
lag2_imp_forecast=lag2(imp_forecast);
if type='Forecast' then imp_forecast=sum(lag_imp_forecast,c,beta1*x1,beta2*x2,ar*((lag_imp_forecast-coalesce(y_lag2,lag2_imp_forecast))-beta1*x1_lag-beta2*x2_lag),-c*ar);
arima_forecast_diff=forecast-imp_forecast;
run;
... View more