topic Size of holdout sample in forecasting in SAS Forecasting and Econometrics

Size of holdout sample in forecasting

JMC — Fri, 16 Aug 2013 18:21:53 GMT

Hello,

I am forecasting time series data with hpfdiagnose and I'm running into a problem with the size of my holdout sample. The code runs quickly (<20 sec) when my holdout sample is 5-9% of data, but the forecasting starts to take incredibly long when I increase the size of my holdout sample to higher values (e.g., 10-20%).

What do you think exaplains this non-linear relationship between sample size and running time? Is there something that I can do about it?

Best,

JMC

Re: Size of holdout sample in forecasting

udo_sas — Mon, 19 Aug 2013 14:54:31 GMT

Hello -

In a way I find your findings counterintuitive, as I would expect faster run times when increasing the holdout sample values.

Would you mind to share your code and some test data to replicate your findings?

Thanks,

Udo

Re: Size of holdout sample in forecasting

JMC — Thu, 22 Aug 2013 15:40:12 GMT

Hello Udo,

Thank you for your response. Unfortunately, my data is proprietary so I cannot share it. However, I have found the following.

My time series data is based on daily data. When seasonality=365, the size of the holdout sample has a large impact on how long the process takes. When seasonality=7, this no longer occurs. Why I had originally set seasonality =365 is because my data has this interesting pattern where every year there is an almost clockwork-like increase of values from the previous years. In other words, the pattern of data within a year are almost perfectly replicated the following year, but their absolute values are all higher relative to the previous year. I found that by setting seasonality=365, I was able to forecast this yearly step-wise function. With seasonality=7, it doesn't work.

Do you have any suggestions for how I could model seasonality=7 and still get the yearly jumps? I've tried adding a year regressor, but this doesn't seem to be doing the trick.

Best,

Juan Manuel

Re: Size of holdout sample in forecasting

udo_sas — Fri, 23 Aug 2013 17:50:19 GMT

Hello Juan Manuel -

Since your data is on daily frequency, seasonality=7 seems to be the right choice.

Of course this does not address your question about modeling the second seasonality you have discovered in your data.

I certainly understand your concern about proprietary data, but without seeing the data my advise has to stick to conceptual ideas only.

When you say that "there is an almost clockwork-like increase of values from the previous years.", would you describe this pattern as a monthly cycle or a weekly cycle - or do you see level shifts across several years?

What I'm getting at is the fact the you should be able to define either discrete events such as calendar events Jan-Dec or Week1-Week52 to model this effect. Alternatively you may want to introduce an adjustment variable which mimics the level shifts. Yet another approach might be to model your data in a 2 step manner: model on daily frequency, model on monthly frequency, and then reconcile both forecasts using the HPFTEMPRECONCILE procedure.

Hope this is useful.

Thanks,

Udo