Solved: Re: Forecasting in SAS VA

sat_lr · Posted 04-02-2014 01:54 AM

Hi All,

I have four years of sales data for which I performed the forecast for the next 10 months.

The point that SAS VA begins to forecast, it gives me a well flat line without any deviations.

Where as when I look the past trend, I could see there are several ups and downs.

Why does the forecast line look flat when the historical data has numerous downtrend and uptrend that has occured in the past.

I am plotting with two variables (Date in Months by Sales Amt).

udo_sas · Posted 04-08-2014 08:41 AM

Hello -

You are running into a common scenario when dealing with aggregated data: your last year (2014) is not fully observed yet - hence you see a drop in amount. Note that for 2011-2013 you have information for all 12 month (I would think), while for 2014 you have not (yet).

As such you may want to exclude the last data point if you want to stick to yearly frequency. Alternatively you may want to switch to monthly frequency - again you will need to make sure that your last month is fully observed (features all days), otherwise you will introduce an "artificial" drop again.

Thanks,

Udo

View solution in original post

udo_sas · Posted 04-02-2014 08:58 AM

Hello -

As a forecaster you should look at your data patterns as: data = historic pattern + random variation.

Unfortunately random variation cannot be predicted - so you will need to focus on historic patterns for your forecasting exercise.

Some typical patterns are:

Level (long-term average) – data fluctuates around a constant mean
Trend – data exhibits an increasing or decreasing pattern
Seasonality – any pattern that regularly repeats itself and is of a constant length (typically daily, weekly, monthly)
Cycle – patterns created by economic fluctuations for example (typically more than one year)

Any good statistical forecasting model will try to detect these patterns for you automatically. SAS Visual Analytics picks the most appropriate model based on the data at hand for you - in your case a simple exponential smoothing model was selected, which only models level. This seems to suggest that your data does not feature any other pattern, but only random variation.

A very nice book on explaining the concept of variation in business context is: Understanding Variation: The Key to Managing Chaos: Donald J. Wheeler.

Hope this is useful.

Thanks,

Udo

sat_lr · Posted 04-03-2014 03:04 AM

Hi Udo, that was an useful information btw. Thanks.

This was a question raised by one of my customer during demo. They expected that the forecasting line would have ups and downs exactly how the business would be in future.

I am forecasting with the sales made in the last 4 years. Now what I see here is the amount shown in negative. I am getting all the forecast values in negative which cannot be the case.

Pls correct if I am doing it wrong. All I have is the sales data which is the measure to forecast and I dont have any other measures to affect it.

I understand as per your suggestion that it's a smoothing model selected automatically by VA which could be the best according to tool. But the forecast points in negative which concerns me.

How will I correct this and get an exact forecast values?

Also some suggestions on lower and upper confidence limit..

sat_lr · Posted 04-03-2014 05:05 AM

Hi Udo, just a quick update.

I just removed one data point from the year'14 which pertains to Jan and I am getting a smooth line now with correct forecast trend. All my upper and lower limits seems to be ok.

As I mentioned early I have 4 years of data 2011, '12, '13, '14. For 2014 I have got only till Jan and the business is of less when compared to other years (coz only 1 month). So I filtered Jan to see whether it gives me correct forecast and seems to be fine. But could you suggest me whether this approach is correct.

The data point (Jan 2014) has 10 million USD which got omitted after which I am getting a smooth increasing trend. But is it good to do this way by omitting such a huge value data point by considering that as an anomaly?

regards,

Satlr

udo_sas · Posted 04-03-2014 09:58 AM

Satlr -

Thanks for following up - glad you deemed my earlier response useful.

You raised a couple of good questions - let me try to tackle them one by one:

"This was a question raised by one of my customer during demo. They expected that the forecasting line would have ups and downs exactly how the business would be in future."

Yes, I understand that desire - unfortunately (or fortunately depending how you look at it) the future remains uncertain. All a statistical model can do is the suggest the most likely behavior based on the pattern in your data - the ups and downs are caused by random variation most likely. Forecasting is not about curve fitting - in fact it has been shown many times that models which fit the past extremely well, are typically poor in predictive the future. Some people refer to this as overfitting the past. Data = historic pattern + random variation - don't try to model randomness. "Understanding variation" is crucial - I'd like to refer you to this blog: Mike Gilliland - The Business Forecasting Deal where you will find lots of useful information about what can be accomplished by statistical forecasting.

"But the forecast points in negative which concerns me."

Remember that the statistical model does not know anything about the business context - it "only" looks at the data and tells you what is most likely going to happen. The fact that sales cannot go negative is something you know as a user. Statistical forecasting does not answer the question of "what should happen" - instead it focuses on the question of "what will happen most likely - based on past behavior").

I agree that it might be useful to have an option in VA which forces the forecast to be non-negative. In SAS Forecast Server you can specify such setting - but it is a post-processing activity. First you model in a unconstraint fashion - then your forecasts will be constrained by rules like: non-negative numbers only.

"So I filtered Jan to see whether it gives me correct forecast and seems to be fine. But could you suggest me whether this approach is correct. The data point (Jan 2014) has 10 million USD which got omitted after which I am getting a smooth increasing trend. But is it good to do this way by omitting such a huge value data point by considering that as an anomaly?"

This is a tricky question to answer without seeing the data at hand. Some general thoughts: modifying past values is usually not a good practice, after all this is what actually happened, right? However, you may want to clean data from extreme events - in particular if you know what caused them. Maybe some one-time sales event caused an extraordinary peek. Forecasting tries to predict "ordinary" behavior.

Actually, what you would like to do is to flag these extreme values either as outliers or special events and model them using more complex techniques such as ARIMAX or UCM. Note that these are not trivial tasks. This is beyond the purpose of Visual Analytics in my honest opinion - for such advanced modeling questions you may want to have a look at specialized environments, such as SAS Forecast Server for example.

Thanks,

Udo

sat_lr · Posted 04-08-2014 03:15 AM

Hi Udo, extremely helpful answer. much appreciate your effort!

All I could observe as an outlier point is the one month of business (Jan'14). becz it has significantly less amount when compared to other years.

For example: It looks like this

Year Amount (in Mn USD)

2011 150

2012 220

2013 275

2014 10

Regards

udo_sas · Posted 04-08-2014 08:41 AM

Hello -

You are running into a common scenario when dealing with aggregated data: your last year (2014) is not fully observed yet - hence you see a drop in amount. Note that for 2011-2013 you have information for all 12 month (I would think), while for 2014 you have not (yet).

As such you may want to exclude the last data point if you want to stick to yearly frequency. Alternatively you may want to switch to monthly frequency - again you will need to make sure that your last month is fully observed (features all days), otherwise you will introduce an "artificial" drop again.

Thanks,

Udo

sat_lr · Posted 04-09-2014 06:05 AM

Hi, thanks for reply.

Yes the last point is considerable low. I get it right when I omit the last point (Jan'14), however customer doesn't like to see his forecast by not omitting 10 Mn USD of business for Jan'14.

Thats my concern. any suggestion?

udo_sas · Posted 04-09-2014 08:19 AM

Hello -

How does the data look like on monthly frequency? If you like to incorporate Jan 14 into your calculations, then switching to month seems to way to go.

If you stick to yearly, then your forecasts are also yearly numbers - as 2014 is not fully observed yet, it shouldn't be used for a yearly calculations.

When switching to a specialized forecasting environment, such as SAS Forecast Server for example, you will be able to let years end at Jan (and start in Feb) - so you be able to use Jan 14, but cut off Jan 11. Of course this is dependent on your business question if such approach would make sense.

Thanks,

Udo

sat_lr · Posted 04-10-2014 01:55 AM

Hi Udo..thanks for your early reply. That seems to be an interesting point.

But I tried categorizing the data points by quarter and month as well with the above mentioned values, I did not find any difference.

Even by that way, if I leave Jan14, I am getting the proper forecast line.

I am not aware of specialized forecast function, let me know if I can try that.

I do not want to cutoff any data point as it represents business for the customer. At present the ERP system used by them serves the forecast functionalities of what they are looking for.

- Satlr

mvgilliland · Posted 04-03-2014 01:58 PM

Could you provide the data (the monthly sales, 2011 - Jan 2014)?

sat_lr · Posted 04-08-2014 03:07 AM

unfortunately I am in a position not to share the customer's data. Probably I could give you the exact scenario of how the data points are. will that help?

- Satlr

mvgilliland · Posted 04-10-2014 09:29 AM

If you can't provide the original monthly data, then can you disguise the data by multiplying everything by some number (e.g. multiply all monthly values by 10). Without seeing what's going on in the data, it is difficult to make a proper assessment. Thanks,

--Mike

sat_lr · Posted 04-28-2014 02:36 AM

Hi,

I have attached the sample data below.

Quarter	Sales Amount $Mn
Q1 2011	26
Q2 2011	220
Q3 2011	240
Q4 2011	249
Q1 2012	264
Q2 2012	262
Q3 2012	264
Q4 2012	260
Q1 2013	291
Q2 2013	282
Q3 2013	292
Q4 2013	293
Q1 2014	53

- satlr

MarcelHaas · Posted 11-19-2014 10:33 AM

At some point I played with this functionality as well. As I observed things similar to you (flat line at the past average, and an uncertainty band encompassing pretty much al previous variation), I did a simple experiment: I took (complete) data for 1 year an replicated that 5 times, creating 5 identical years, hoping that the forecasting tools would pick up certain trends in the past (like the dip that occured in August every year). Note that I did not use one numbe rper month, but a very large set of points of sales at different dates and that what I tried to predict is the total sales per month.

Even with perfectly replicated data for many years in a row, obvious seasonal variation is not well fitted. Maybe overfitting can be an issue in some cases, the VA forecaster seems to underfit seasonality. If the same (>30%!) dip occurs every year in the same month, in my humble opinion a forecasting tool should at least show some of this in the forecast for next year. I think a simple forecast would be a trend (linear over time, for example), plus seasonality, plus random 'noise'. The latter cannot be predicted, of course, but the first two can be, to some extent. Given that even with a series of identical years VA refuses to do so it seems to me that only a trend is predicted and that the uncertainty range is just obtained from all variation around the trend in the time span that is in the data.

Would Forecast Studio give a more complete package to do forecasting in some more detail? For now, my self-written base sas code does better than VA. Far better.

Registration is open