Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- Identifying AR and MA terms

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-01-2018 11:03 AM
(2898 views)

Hello,

I have some problems identifying the right model for my ARIMA model. Just wanna see if I can get some advice.

Below is my data set:

```
data ts;
input time x;
datalines;
1 100.06
2 103.07
3 102.02
4 103.6
5 105.16
6 104.31
7 105.63
8 106.18
9 108.18
10 112.83
11 111.18
12 111.1
13 114.06
14 112.92
15 113
16 113.92
17 114.44
18 117.26
19 122.72
20 117.93
21 121.71
22 124.04
23 122.56
24 120.95
25 124.36
26 122.98
27 125.88
28 120.94
29 120.78
30 119.63
31 117.87
32 122.25
33 124.6
34 126.32
35 122.21
36 123.17
37 122.03
38 121.04
39 121.37
40 118.05
41 117.16
42 122.27
43 125.91
44 123.5
45 123.95
46 120.66
47 125.94
48 131.16
49 131.24
50 135.48
51 130.47
52 128.85
53 125.1
54 123.99
55 128.14
56 125
57 118.73
58 124.35
59 120.2
60 118.87
61 114.34
62 114.37
63 116.97
64 114.25
65 108.21
66 110.48
67 108.29
68 111.06
69 111.46
70 112.65
71 107.04
72 115.57
73 112.91
74 113.78
75 117.17
76 115.65
77 111.62
78 111.16
79 110.51
80 113.97
81 106.6
82 105.08
83 106.39
84 110.55
85 112.19
86 106.68
87 112.57
88 109.27
89 113.19
90 107.41
91 109.63
92 104.12
93 107.31
94 111.46
95 106.53
96 110.75
97 106.25
98 109.24
99 106.27
100 112.58
101 109.99
102 109.74
103 106.91
104 111.15
105 107.17
106 105.55
107 103.58
108 103.72
109 101.82
110 105.61
111 102.81
112 101.43
113 108.76
114 105.08
115 103.23
116 105.4
117 101.67
118 99.36
119 98.58
120 98.7
121 99.19
122 103.58
123 98.8
124 100.07
125 99.38
126 103.04
127 102.2
128 99.43
129 97.73
130 99.93
131 104.9
132 101.25
133 96.99
134 98.63
135 100.85
136 98.82
137 107.32
138 98.88
139 102.45
140 94.76
141 99.78
142 98.61
143 99.37
144 101.18
145 101.14
146 100.08
147 98.43
148 99.5
149 103.72
150 104.07
151 106.86
152 101.67
153 110.08
154 108.92
155 106.51
156 108.73
157 111.99
158 113.41
159 113.08
160 122.24
161 121.92
162 122.71
163 122.11
164 120.82
165 118.34
166 118.72
167 120.24
168 119.24
169 118.71
170 120.71
171 123.43
172 121.86
173 122.87
174 121.78
175 119.77
176 123.73
177 127.03
178 122.6
179 122.65
180 120.34
181 118.63
182 115.47
183 114.24
184 114.86
185 111.09
186 115.95
187 114.88
188 116.52
189 114.31
190 116.97
191 114.85
192 113.83
193 118.1
194 115.49
195 117.8
196 120.77
197 115.12
198 113.04
;
run;
```

I have 198 observations here. I plotted the original time series plot and the series doesn't seem to be stationary:

```
proc timeseries data=ts plots=(corr);
var x;
run;
```

So I did one order of differencing and below is the plot:

After differencing, most of the autocorrelation from the ACF and PACF plots are gone.

I believe based on the ACF and PACF plots, the ARIMA (0, 1, 1) model should be the right model. However, I also fit the ARIMA (2, 0, 0) and ARIMA (1, 1, 0) for comparison purposes.

In addition, I use the last 30 observations as my testing data and below is the Proc ARIMA that I used:

```
proc arima data=ts plots
(only)=(series(corr crosscorr) residual(corr normal)
forecast(forecastonly)) out=out_es2;
identify var=x(1);
estimate q=(1);
forecast lead=30 back=30 alpha=0.05;
outlier;
run;
quit;
```

I have the following summary:

Arima (1, 1, 0) | Arima (0, 1, 1) | Arima (2, 0, 0) | |

AIC | 1009.837 | 1009.94 | 1085.057 |

SBC | 1016.404 | 1016.506 | 1091.633 |

RMSE | 4.7993 | 4.8193 | 4.1525 |

WN test (lag 6) | 0.705 | 0.6315 | <0.0001 |

So below are my questions:

1. Based on the ACF and PACF plots, the ARIMA (0, 1, 1) model should be the right model for the data. However, the ARIMA (1, 1, 0) model also gives similar results in terms of AIC, SBC and RMSE. Is this normal?

2. Based on the AIC and SBC, the right model is ARIMA (1, 1, 0). However, the ARIMA (2, 0, 0) gives the lowest RMSE on the testing data set (4.1525 vs. 4.8193 or 4.7993).

I'm having problems identifying which is the best model for this time series data. If I were to choose a model that has the greatest predictive power, the best model would be ARIMA (2, 0, 0), correct?

Could anyone guide me on how to select the model based on the results that I have?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This seems correct. Your time series is not deviating a lot from stationary. Differencing will not impact a lot and ARIMA(1,0,1) will also be a good choice. Please look into documentation regarding scan, esacf. This helps in model identification with regard to p , d and q. Again, there are multiple factors which need to be considered in Box-Jenkins approach to come up with a good forecasting model

proc arima data=ts plots;

identify var=x scan;

run;

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

AIC and SBC are used for models' comparison. Model with smaller values of these statistics would be preferred. Lowest RMSE alone can not be considered as a criteria to select a model for forecasting. ARIMA (2, 0, 0) generates correlated errors, which is problematic in model generalization. Please try ARIMA(1,0,1) seems a good fit based on your data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks @stat_sas.

ARIMA (1 0 1) does seem like the best model. Now I got two other question:

1. ARIMA (1 0 1) does not have any differencing. However, when I look at the original time series plot (without differencing), the plot does not look quite stationary:

Does that mean differencing is not always necessary when you have a non-stationary time series? From what I learned, you always do differencing when the mean is not stabilized. I'm new to time series. Just want to make sure I understand the concept correctly.

2. When I look at the ACF and PACF plots, I would think ARIMA (0 1 1) would be the right model. Is it a standard practice to try a number of similar models such as ARIMA (1 0 1) or ARIMA (1 1 0) to find out which one work best?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Sorry, I mean ARIMA(1,1,0). Differencing is required in your problem. Stationary time series is always required for ARIMA models. Few things regrading these models.

1. ACF and PACF are helpful in model identifications. After differencing in provided series only first Auto correlation is significant.

2. ACF for errors (Should not be significant which is not true in ARIMA(2,0,0))

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks! Sorry now that you mentioned ARIMA (1 0 1), I did fit the model and compare it to the ARIMA (1 1 0) model. Below is the comparison I put together:

Arima (1, 1, 0) | Arima (1, 0, 1) | |

AIC | 1009.837 | 1015.583 |

SBC | 1016.404 | 1025.484 |

RMSE | 4.7993 | 3.3511 |

WN test (lag 6) | 0.705 | 0.4385 |

The residual plots for the ARIMA (1 0 1) model doesn't seem to have any issues:

The model has a slightly higher AIC and SBC than the ARIMA (1 1 0) model. However, the RMSE is much lower.

Is there a reason why differencing is necessary when the ARIMA (1 0 1) model residual doesn't show any problem and the model itself gets a better result in terms of prediction errors?

Thanks so much! This is some great help 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

How about parameter estimates for ARIMA(1,0,1)? Are they significant?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think they are significant:

(or am I looking at the wrong table?)

Below are additional tables from the output:

Residual plots all look good:

Below is the code to generate these tables:

```
proc arima data=ts plots
(only)=(series(corr crosscorr) residual(corr normal)
forecast(forecastonly)) out=out_es3;
identify var=x;
estimate p=(1) q=(1);
forecast lead=30 back=30 alpha=0.05;
outlier;
run;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This seems correct. Your time series is not deviating a lot from stationary. Differencing will not impact a lot and ARIMA(1,0,1) will also be a good choice. Please look into documentation regarding scan, esacf. This helps in model identification with regard to p , d and q. Again, there are multiple factors which need to be considered in Box-Jenkins approach to come up with a good forecasting model

proc arima data=ts plots;

identify var=x scan;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks so much!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.