08-16-2013 02:21 PM
I am forecasting time series data with hpfdiagnose and I'm running into a problem with the size of my holdout sample. The code runs quickly (<20 sec) when my holdout sample is 5-9% of data, but the forecasting starts to take incredibly long when I increase the size of my holdout sample to higher values (e.g., 10-20%).
What do you think exaplains this non-linear relationship between sample size and running time? Is there something that I can do about it?
08-19-2013 10:54 AM
In a way I find your findings counterintuitive, as I would expect faster run times when increasing the holdout sample values.
Would you mind to share your code and some test data to replicate your findings?
08-22-2013 11:40 AM
Thank you for your response. Unfortunately, my data is proprietary so I cannot share it. However, I have found the following.
My time series data is based on daily data. When seasonality=365, the size of the holdout sample has a large impact on how long the process takes. When seasonality=7, this no longer occurs. Why I had originally set seasonality =365 is because my data has this interesting pattern where every year there is an almost clockwork-like increase of values from the previous years. In other words, the pattern of data within a year are almost perfectly replicated the following year, but their absolute values are all higher relative to the previous year. I found that by setting seasonality=365, I was able to forecast this yearly step-wise function. With seasonality=7, it doesn't work.
Do you have any suggestions for how I could model seasonality=7 and still get the yearly jumps? I've tried adding a year regressor, but this doesn't seem to be doing the trick.
08-23-2013 01:50 PM
Hello Juan Manuel -
Since your data is on daily frequency, seasonality=7 seems to be the right choice.
Of course this does not address your question about modeling the second seasonality you have discovered in your data.
I certainly understand your concern about proprietary data, but without seeing the data my advise has to stick to conceptual ideas only.
When you say that "there is an almost clockwork-like increase of values from the previous years.", would you describe this pattern as a monthly cycle or a weekly cycle - or do you see level shifts across several years?
What I'm getting at is the fact the you should be able to define either discrete events such as calendar events Jan-Dec or Week1-Week52 to model this effect. Alternatively you may want to introduce an adjustment variable which mimics the level shifts. Yet another approach might be to model your data in a 2 step manner: model on daily frequency, model on monthly frequency, and then reconcile both forecasts using the HPFTEMPRECONCILE procedure.
Hope this is useful.