BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kisumsam
Quartz | Level 8

Hello, 

 

I have some problems identifying the right model for my ARIMA model. Just wanna see if I can get some advice.

 

Below is my data set:

 

data ts;
input time x;
datalines;
1 100.06
2 103.07
3 102.02
4 103.6
5 105.16
6 104.31
7 105.63
8 106.18
9 108.18
10 112.83
11 111.18
12 111.1
13 114.06
14 112.92
15 113
16 113.92
17 114.44
18 117.26
19 122.72
20 117.93
21 121.71
22 124.04
23 122.56
24 120.95
25 124.36
26 122.98
27 125.88
28 120.94
29 120.78
30 119.63
31 117.87
32 122.25
33 124.6
34 126.32
35 122.21
36 123.17
37 122.03
38 121.04
39 121.37
40 118.05
41 117.16
42 122.27
43 125.91
44 123.5
45 123.95
46 120.66
47 125.94
48 131.16
49 131.24
50 135.48
51 130.47
52 128.85
53 125.1
54 123.99
55 128.14
56 125
57 118.73
58 124.35
59 120.2
60 118.87
61 114.34
62 114.37
63 116.97
64 114.25
65 108.21
66 110.48
67 108.29
68 111.06
69 111.46
70 112.65
71 107.04
72 115.57
73 112.91
74 113.78
75 117.17
76 115.65
77 111.62
78 111.16
79 110.51
80 113.97
81 106.6
82 105.08
83 106.39
84 110.55
85 112.19
86 106.68
87 112.57
88 109.27
89 113.19
90 107.41
91 109.63
92 104.12
93 107.31
94 111.46
95 106.53
96 110.75
97 106.25
98 109.24
99 106.27
100 112.58
101 109.99
102 109.74
103 106.91
104 111.15
105 107.17
106 105.55
107 103.58
108 103.72
109 101.82
110 105.61
111 102.81
112 101.43
113 108.76
114 105.08
115 103.23
116 105.4
117 101.67
118 99.36
119 98.58
120 98.7
121 99.19
122 103.58
123 98.8
124 100.07
125 99.38
126 103.04
127 102.2
128 99.43
129 97.73
130 99.93
131 104.9
132 101.25
133 96.99
134 98.63
135 100.85
136 98.82
137 107.32
138 98.88
139 102.45
140 94.76
141 99.78
142 98.61
143 99.37
144 101.18
145 101.14
146 100.08
147 98.43
148 99.5
149 103.72
150 104.07
151 106.86
152 101.67
153 110.08
154 108.92
155 106.51
156 108.73
157 111.99
158 113.41
159 113.08
160 122.24
161 121.92
162 122.71
163 122.11
164 120.82
165 118.34
166 118.72
167 120.24
168 119.24
169 118.71
170 120.71
171 123.43
172 121.86
173 122.87
174 121.78
175 119.77
176 123.73
177 127.03
178 122.6
179 122.65
180 120.34
181 118.63
182 115.47
183 114.24
184 114.86
185 111.09
186 115.95
187 114.88
188 116.52
189 114.31
190 116.97
191 114.85
192 113.83
193 118.1
194 115.49
195 117.8
196 120.77
197 115.12
198 113.04
;
run;

I have 198 observations here. I plotted the original time series plot and the series doesn't seem to be stationary:

 

proc timeseries data=ts plots=(corr);
var x;
run;

i0001.png

 

So I did one order of differencing and below is the plot:

 

i0002.png

 

After differencing, most of the autocorrelation from the ACF and PACF plots are gone.

 

I believe based on the ACF and PACF plots, the ARIMA (0, 1, 1) model should be the right model. However, I also fit the ARIMA (2, 0, 0) and ARIMA (1, 1, 0) for comparison purposes.

 

In addition, I use the last 30 observations as my testing data and below is the Proc ARIMA that I used:

 

proc arima data=ts plots
    (only)=(series(corr crosscorr) residual(corr normal) 
		forecast(forecastonly)) out=out_es2;
	identify var=x(1);
	estimate q=(1);
	forecast lead=30 back=30 alpha=0.05;
	outlier;
	run;
quit;

I have the following summary:

 

 Arima (1, 1, 0)Arima (0, 1, 1)Arima (2, 0, 0)
AIC1009.8371009.941085.057
SBC1016.4041016.5061091.633
RMSE4.79934.81934.1525
WN test (lag 6)0.7050.6315<0.0001

 

So below are my questions:

 

1. Based on the ACF and PACF plots, the ARIMA (0, 1, 1) model should be the right model for the data. However, the ARIMA (1, 1, 0) model also gives similar results in terms of AIC, SBC and RMSE. Is this normal?

 

2. Based on the AIC and SBC, the right model is ARIMA (1, 1, 0). However, the ARIMA (2, 0, 0) gives the lowest RMSE on the testing data set (4.1525 vs. 4.8193 or 4.7993).

 

I'm having problems identifying which is the best model for this time series data. If I were to choose a model that has the greatest predictive power, the best model would be ARIMA (2, 0, 0), correct? 

 

Could anyone guide me on how to select the model based on the results that I have?

 

1 ACCEPTED SOLUTION

Accepted Solutions
stat_sas
Ammonite | Level 13

This seems correct. Your time series is not deviating a lot from stationary. Differencing will not impact a lot and ARIMA(1,0,1) will also be a good choice. Please look into documentation regarding scan, esacf. This helps in model identification with regard to p , d and q. Again, there are multiple factors which need to be considered in Box-Jenkins approach to come up with a good forecasting model

 

proc arima data=ts plots;
identify var=x scan;
run;

View solution in original post

8 REPLIES 8
stat_sas
Ammonite | Level 13

Hi,

 

AIC and SBC are used for models' comparison. Model with smaller values of these statistics would be preferred. Lowest RMSE alone can not be considered as a criteria to select a model for forecasting. ARIMA (2, 0, 0)  generates correlated errors, which is problematic in model generalization.  Please try ARIMA(1,0,1) seems a good fit based on your data.

kisumsam
Quartz | Level 8

Thanks @stat_sas.

 

ARIMA (1 0 1) does seem like the best model. Now I got two other question:

 

1. ARIMA (1 0 1) does not have any differencing. However, when I look at the original time series plot (without differencing), the plot does not look quite stationary:

 

i0003.png

 

 

Does that mean differencing is not always necessary when you have a non-stationary time series? From what I learned, you always do differencing when the mean is not stabilized. I'm new to time series. Just want to make sure I understand the concept correctly.

 

2. When I look at the ACF and PACF plots, I would think ARIMA (0 1 1) would be the right model. Is it a standard practice to try a number of similar models such as ARIMA (1 0 1) or ARIMA (1 1 0) to find out which one work best? 

 

 

stat_sas
Ammonite | Level 13

Hi,

 

Sorry, I mean ARIMA(1,1,0). Differencing is required in your problem. Stationary time series is always required for ARIMA models. Few things regrading these models.

 

1.  ACF and PACF are helpful in model identifications. After differencing in provided series only first Auto correlation is significant.

2.  ACF for errors (Should not be significant which is not true in ARIMA(2,0,0))

 

 

 

kisumsam
Quartz | Level 8

Thanks! Sorry now that you mentioned ARIMA (1 0 1), I did fit the model and compare it to the ARIMA (1 1 0) model. Below is the comparison I put together:

 

 Arima (1, 1, 0)Arima (1, 0, 1)
AIC1009.8371015.583
SBC1016.4041025.484
RMSE4.79933.3511
WN test (lag 6)0.7050.4385

 

The residual plots for the ARIMA (1 0 1) model doesn't seem to have any issues:

 

i0004.png

 

The model has a slightly higher AIC and SBC than the ARIMA (1 1 0) model. However, the RMSE is much lower.

 

Is there a reason why differencing is necessary when the ARIMA (1 0 1) model residual doesn't show any problem and the model itself gets a better result in terms of prediction errors?

 

Thanks so much! This is some great help 🙂

 

stat_sas
Ammonite | Level 13

Hi,

 

How about parameter estimates for ARIMA(1,0,1)? Are they significant?

kisumsam
Quartz | Level 8

I think they are significant:

 

i0005.png

 

(or am I looking at the wrong table?)

 

Below are additional tables from the output:

 

i0006.png

 

Residual plots all look good:

 

i0007.png

 

i0008.png

 

Below is the code to generate these tables:

 

proc arima data=ts plots
    (only)=(series(corr crosscorr) residual(corr normal) 
		forecast(forecastonly)) out=out_es3;
	identify var=x;
	estimate p=(1) q=(1);
	forecast lead=30 back=30 alpha=0.05;
	outlier;
	run;
quit;

 

stat_sas
Ammonite | Level 13

This seems correct. Your time series is not deviating a lot from stationary. Differencing will not impact a lot and ARIMA(1,0,1) will also be a good choice. Please look into documentation regarding scan, esacf. This helps in model identification with regard to p , d and q. Again, there are multiple factors which need to be considered in Box-Jenkins approach to come up with a good forecasting model

 

proc arima data=ts plots;
identify var=x scan;
run;

kisumsam
Quartz | Level 8
Thanks so much!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2409 views
  • 1 like
  • 2 in conversation