Hi, I have a very simple time series (attached) which I need to model and forecast with sarima. The model is already defined: p,d,q=(1,1,1), the seasonality is 52, and the p,d,q for seasonality are = (0,1,1).
I fit the model with python statsmodels library SARIMAX:
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
df = pd.read_csv('series.csv',sep=";")
model_training = SARIMAX(df.v, order=(1, 1, 1), seasonal_order=(0, 1, 1, 52)).fit(disp=False)
model_training.get_forecast(1).summary_frame()
and I get for example the first forecast value:
mean mean_se mean_ci_lower mean_ci_upper
55.683232 3.071487 49.663228 61.703237
Then I fit the very same model with the very same data in CAS using uniTimeSeries.arima:
data series;
infile "series.csv" dsd truncover firstobs=2 delimiter=";";
input d $ v;
date = input(d,ddmmyy10.);
format date yymmdd10.;
drop d;
rename date=d;
run;
cas session;
libname CASUSER cas caslib="CASUSER";
data CASUSER.series; set series; run;
proc cas;
uniTimeSeries.arima /
table={name="series", caslib="CASUSER"}
timeId={name="d"}
interval="day"
outFor={name="for", replace=True}
outEst={name="est", replace=True}
series={{name='v',
model={{estimate={p={{factor={1}}} q={{factor={1,52}}},diff={1, 52},noint=True},
forecast={{lead=1}}}}
}}
;
run;
quit;
and I get for example the first forecast value:
Forecast Std Error 95% Confidence Limits
56.4786 3.5467 49.5272 63.4301
As you can see the forecasted value is different, and the SAS model is generally worse: the true value is closer to 55, and also the CL are broader. And it gets worse requesting more forecast leads, while the python model keeps being more accurate.
My question is: why? How it is possible that the SAS model has a different result? I tried to change maybe the noint parameter or the convergence criteria to see maybe if there were some different defaults, but no matter what I change the python model is always better.
What am I missing?
Just to contextualize: in the project I'm following, the team has already experimented successfully with python statsmodels for some time series; now they want to apply the model to thousands of time series exploiting the CAS parallelism, but they are being very disappointed since the models they already know were good are not goot anymore in sas, and I really don't understand why.
Note: my final model must run in CAS, so please do not provide sas base code.
Thanks a lot!
... View more