- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have four years of sales data for which I performed the forecast for the next 10 months.
The point that SAS VA begins to forecast, it gives me a well flat line without any deviations.
Where as when I look the past trend, I could see there are several ups and downs.
Why does the forecast line look flat when the historical data has numerous downtrend and uptrend that has occured in the past.
I am plotting with two variables (Date in Months by Sales Amt).
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
You are running into a common scenario when dealing with aggregated data: your last year (2014) is not fully observed yet - hence you see a drop in amount. Note that for 2011-2013 you have information for all 12 month (I would think), while for 2014 you have not (yet).
As such you may want to exclude the last data point if you want to stick to yearly frequency. Alternatively you may want to switch to monthly frequency - again you will need to make sure that your last month is fully observed (features all days), otherwise you will introduce an "artificial" drop again.
Thanks,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
As a forecaster you should look at your data patterns as: data = historic pattern + random variation.
Unfortunately random variation cannot be predicted - so you will need to focus on historic patterns for your forecasting exercise.
Some typical patterns are:
- Level (long-term average) – data fluctuates around a constant mean
- Trend – data exhibits an increasing or decreasing pattern
- Seasonality – any pattern that regularly repeats itself and is of a constant length (typically daily, weekly, monthly)
- Cycle – patterns created by economic fluctuations for example (typically more than one year)
Any good statistical forecasting model will try to detect these patterns for you automatically. SAS Visual Analytics picks the most appropriate model based on the data at hand for you - in your case a simple exponential smoothing model was selected, which only models level. This seems to suggest that your data does not feature any other pattern, but only random variation.
A very nice book on explaining the concept of variation in business context is: Understanding Variation: The Key to Managing Chaos: Donald J. Wheeler.
Hope this is useful.
Thanks,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Udo, that was an useful information btw. Thanks.
This was a question raised by one of my customer during demo. They expected that the forecasting line would have ups and downs exactly how the business would be in future.
I am forecasting with the sales made in the last 4 years. Now what I see here is the amount shown in negative. I am getting all the forecast values in negative which cannot be the case.
Pls correct if I am doing it wrong. All I have is the sales data which is the measure to forecast and I dont have any other measures to affect it.
I understand as per your suggestion that it's a smoothing model selected automatically by VA which could be the best according to tool. But the forecast points in negative which concerns me.
How will I correct this and get an exact forecast values?
Also some suggestions on lower and upper confidence limit..
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Udo, just a quick update.
I just removed one data point from the year'14 which pertains to Jan and I am getting a smooth line now with correct forecast trend. All my upper and lower limits seems to be ok.
As I mentioned early I have 4 years of data 2011, '12, '13, '14. For 2014 I have got only till Jan and the business is of less when compared to other years (coz only 1 month). So I filtered Jan to see whether it gives me correct forecast and seems to be fine. But could you suggest me whether this approach is correct.
The data point (Jan 2014) has 10 million USD which got omitted after which I am getting a smooth increasing trend. But is it good to do this way by omitting such a huge value data point by considering that as an anomaly?
regards,
Satlr
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Satlr -
Thanks for following up - glad you deemed my earlier response useful.
You raised a couple of good questions - let me try to tackle them one by one:
- "This was a question raised by one of my customer during demo. They expected that the forecasting line would have ups and downs exactly how the business would be in future."
Yes, I understand that desire - unfortunately (or fortunately depending how you look at it) the future remains uncertain. All a statistical model can do is the suggest the most likely behavior based on the pattern in your data - the ups and downs are caused by random variation most likely. Forecasting is not about curve fitting - in fact it has been shown many times that models which fit the past extremely well, are typically poor in predictive the future. Some people refer to this as overfitting the past. Data = historic pattern + random variation - don't try to model randomness. "Understanding variation" is crucial - I'd like to refer you to this blog: Mike Gilliland - The Business Forecasting Deal where you will find lots of useful information about what can be accomplished by statistical forecasting.
- "But the forecast points in negative which concerns me."
Remember that the statistical model does not know anything about the business context - it "only" looks at the data and tells you what is most likely going to happen. The fact that sales cannot go negative is something you know as a user. Statistical forecasting does not answer the question of "what should happen" - instead it focuses on the question of "what will happen most likely - based on past behavior").
I agree that it might be useful to have an option in VA which forces the forecast to be non-negative. In SAS Forecast Server you can specify such setting - but it is a post-processing activity. First you model in a unconstraint fashion - then your forecasts will be constrained by rules like: non-negative numbers only.
- "So I filtered Jan to see whether it gives me correct forecast and seems to be fine. But could you suggest me whether this approach is correct. The data point (Jan 2014) has 10 million USD which got omitted after which I am getting a smooth increasing trend. But is it good to do this way by omitting such a huge value data point by considering that as an anomaly?"
This is a tricky question to answer without seeing the data at hand. Some general thoughts: modifying past values is usually not a good practice, after all this is what actually happened, right? However, you may want to clean data from extreme events - in particular if you know what caused them. Maybe some one-time sales event caused an extraordinary peek. Forecasting tries to predict "ordinary" behavior.
Actually, what you would like to do is to flag these extreme values either as outliers or special events and model them using more complex techniques such as ARIMAX or UCM. Note that these are not trivial tasks. This is beyond the purpose of Visual Analytics in my honest opinion - for such advanced modeling questions you may want to have a look at specialized environments, such as SAS Forecast Server for example.
Thanks,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Udo, extremely helpful answer. much appreciate your effort!
All I could observe as an outlier point is the one month of business (Jan'14). becz it has significantly less amount when compared to other years.
For example: It looks like this
Year Amount (in Mn USD)
2011 150
2012 220
2013 275
2014 10
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
You are running into a common scenario when dealing with aggregated data: your last year (2014) is not fully observed yet - hence you see a drop in amount. Note that for 2011-2013 you have information for all 12 month (I would think), while for 2014 you have not (yet).
As such you may want to exclude the last data point if you want to stick to yearly frequency. Alternatively you may want to switch to monthly frequency - again you will need to make sure that your last month is fully observed (features all days), otherwise you will introduce an "artificial" drop again.
Thanks,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for reply.
Yes the last point is considerable low. I get it right when I omit the last point (Jan'14), however customer doesn't like to see his forecast by not omitting 10 Mn USD of business for Jan'14.
Thats my concern. any suggestion?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
How does the data look like on monthly frequency? If you like to incorporate Jan 14 into your calculations, then switching to month seems to way to go.
If you stick to yearly, then your forecasts are also yearly numbers - as 2014 is not fully observed yet, it shouldn't be used for a yearly calculations.
When switching to a specialized forecasting environment, such as SAS Forecast Server for example, you will be able to let years end at Jan (and start in Feb) - so you be able to use Jan 14, but cut off Jan 11. Of course this is dependent on your business question if such approach would make sense.
Thanks,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Udo..thanks for your early reply. That seems to be an interesting point.
But I tried categorizing the data points by quarter and month as well with the above mentioned values, I did not find any difference.
Even by that way, if I leave Jan14, I am getting the proper forecast line.
I am not aware of specialized forecast function, let me know if I can try that.
I do not want to cutoff any data point as it represents business for the customer. At present the ERP system used by them serves the forecast functionalities of what they are looking for.
- Satlr
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Could you provide the data (the monthly sales, 2011 - Jan 2014)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
unfortunately I am in a position not to share the customer's data. Probably I could give you the exact scenario of how the data points are. will that help?
- Satlr
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you can't provide the original monthly data, then can you disguise the data by multiplying everything by some number (e.g. multiply all monthly values by 10). Without seeing what's going on in the data, it is difficult to make a proper assessment. Thanks,
--Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have attached the sample data below.
Quarter | Sales Amount $Mn |
Q1 2011 | 26 |
Q2 2011 | 220 |
Q3 2011 | 240 |
Q4 2011 | 249 |
Q1 2012 | 264 |
Q2 2012 | 262 |
Q3 2012 | 264 |
Q4 2012 | 260 |
Q1 2013 | 291 |
Q2 2013 | 282 |
Q3 2013 | 292 |
Q4 2013 | 293 |
Q1 2014 | 53 |
- satlr
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
At some point I played with this functionality as well. As I observed things similar to you (flat line at the past average, and an uncertainty band encompassing pretty much al previous variation), I did a simple experiment: I took (complete) data for 1 year an replicated that 5 times, creating 5 identical years, hoping that the forecasting tools would pick up certain trends in the past (like the dip that occured in August every year). Note that I did not use one numbe rper month, but a very large set of points of sales at different dates and that what I tried to predict is the total sales per month.
Even with perfectly replicated data for many years in a row, obvious seasonal variation is not well fitted. Maybe overfitting can be an issue in some cases, the VA forecaster seems to underfit seasonality. If the same (>30%!) dip occurs every year in the same month, in my humble opinion a forecasting tool should at least show some of this in the forecast for next year. I think a simple forecast would be a trend (linear over time, for example), plus seasonality, plus random 'noise'. The latter cannot be predicted, of course, but the first two can be, to some extent. Given that even with a series of identical years VA refuses to do so it seems to me that only a trend is predicted and that the uncertainty range is just obtained from all variation around the trend in the time span that is in the data.
Would Forecast Studio give a more complete package to do forecasting in some more detail? For now, my self-written base sas code does better than VA. Far better.