Hi,
I have a weekly time series in the following format.
category Product date Sales Is_Discounted
a123 1224 12JUN2016 20 1
a123 1224 19JUN2016 10 0
a123 1224 26JUN2016 25 1
a123 1224 03JUL2016 19 0
a123 1224 10JUL2016 18 0
I need to predict weekly Sales for each unique combination of category and Product.
I want to remove the seasonality from this time series and then create forecasts using proc reg and then add the seasonality back on the forecasts. I have been able to create regression model (using month number, week number and Is_Discounted as independent variables) without removing seasonality and its working fine, but I believe the seasonality in the data is impacting my accuracy with regression. I checked proc X12 but I do not understand how to use it to add the seasonality back after regression. Thanks for the help!
First, I would like to make sure I fully understand your current approach. You were trying to capture trend and seasonality using month number and week number, right? So I guess the week number you used is actually the week id instead of 1 to 52, right? Otherwise, it will be nested within the month number, and there is no variable to capture the trend. If this is the case, you are using this week id to capture the trend and month number to capture the seasonality, which assumes the seasonality to be 12.
My understanding is that you want to do a 3-step process:
1. Estimate the trend and seasonality, and forecast trend and seasonality
2. Estimate the de-trend and de-seasonlized series
3. Combine the forecast from the previous 2 steps.
By "adding“, there might be additive model and multiplicative model. I'm just assuming additive model for simplicity, you can apply the same logic to multiplicative model too.
1. There are many ways to estimate the trend and seasonality. For example, you can run regression models just on trend and seasonality term, i.e. run regression y=t+s or y=t+t^2+s or whatever model you like. Here, the t is your time id, and s is the seasonal indices you want to use.
2. To compute the de-trend and de-seaonalized series, you can just take the estimates y^ from from your last model, and compute the residual. For additive model, it just means compute z=sales-y^. Then you can run your regression model on this residual.
3. With the model in step 1 and step 2, you can easily score the forecast for the historical period and future, and "add" them together.
While I'm no expert on time series forecasting, I'm pretty sure that your linear regression is indeed missing the seasonality of your data. I believe that you want to use something like PROC ARIMA and fit a time series model that contains 52 week seasonality; or that contains 12 month seasonality (I wouldn't use both week and month); and this model also contains your "is_discontinued" variable.
Here is an example of forecasting in the presence of seasonality using PROC ARIMA.
There is also an example in the PROC ARIMA documentation for an "intervention model" which I believe would handle your "is_discontinued" variable.
Thanks Paige! The reason I do not want to go for ARIMA or ESM is that these models are very much prone to level shifts. So, if the last months sales came higher these models start predicting from that higher level rather than coming back to the normal levels. Moreover even without the seasonality treatment, my accuracy with regression is slightly better than arima.
Level shifts? Are these due to your variable "is_discontinued"? Or due to some other variable in your data? Or are they essentially random?
If they are essentially random with respect to your predictor variables and with respect to seasonality and past history, then I am not aware of any model that will predict these types of level shifts. However, if you just want to remove the effects of these shifts from the forecast, you could first remove them from the raw data series.
I think I was not clear with my reply. If the sales from the last month came higher then the forecast starts from the higher sales level which is something I don't want. Regression does not have any autocorrelation so it is serving the purpose for that reason.
If the sales from the last month came higher then the forecast starts from the higher sales level which is something I don't want.
I am not sure that's how things really work with an ARIMA time series.
But anyway, you have changed the problem description several times from your original problem statement, to the point where my understanding of time series analysis is weak, so I don't think I can help you further.
First, I would like to make sure I fully understand your current approach. You were trying to capture trend and seasonality using month number and week number, right? So I guess the week number you used is actually the week id instead of 1 to 52, right? Otherwise, it will be nested within the month number, and there is no variable to capture the trend. If this is the case, you are using this week id to capture the trend and month number to capture the seasonality, which assumes the seasonality to be 12.
My understanding is that you want to do a 3-step process:
1. Estimate the trend and seasonality, and forecast trend and seasonality
2. Estimate the de-trend and de-seasonlized series
3. Combine the forecast from the previous 2 steps.
By "adding“, there might be additive model and multiplicative model. I'm just assuming additive model for simplicity, you can apply the same logic to multiplicative model too.
1. There are many ways to estimate the trend and seasonality. For example, you can run regression models just on trend and seasonality term, i.e. run regression y=t+s or y=t+t^2+s or whatever model you like. Here, the t is your time id, and s is the seasonal indices you want to use.
2. To compute the de-trend and de-seaonalized series, you can just take the estimates y^ from from your last model, and compute the residual. For additive model, it just means compute z=sales-y^. Then you can run your regression model on this residual.
3. With the model in step 1 and step 2, you can easily score the forecast for the historical period and future, and "add" them together.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.