Hello,
I am forecasting time series data with hpfdiagnose and I'm running into a problem with the size of my holdout sample. The code runs quickly (<20 sec) when my holdout sample is 5-9% of data, but the forecasting starts to take incredibly long when I increase the size of my holdout sample to higher values (e.g., 10-20%).
What do you think exaplains this non-linear relationship between sample size and running time? Is there something that I can do about it?
Best,
JMC
Hello -
In a way I find your findings counterintuitive, as I would expect faster run times when increasing the holdout sample values.
Would you mind to share your code and some test data to replicate your findings?
Thanks,
Udo
Hello Udo,
Thank you for your response. Unfortunately, my data is proprietary so I cannot share it. However, I have found the following.
My time series data is based on daily data. When seasonality=365, the size of the holdout sample has a large impact on how long the process takes. When seasonality=7, this no longer occurs. Why I had originally set seasonality =365 is because my data has this interesting pattern where every year there is an almost clockwork-like increase of values from the previous years. In other words, the pattern of data within a year are almost perfectly replicated the following year, but their absolute values are all higher relative to the previous year. I found that by setting seasonality=365, I was able to forecast this yearly step-wise function. With seasonality=7, it doesn't work.
Do you have any suggestions for how I could model seasonality=7 and still get the yearly jumps? I've tried adding a year regressor, but this doesn't seem to be doing the trick.
Best,
Juan Manuel
Hello Juan Manuel -
Since your data is on daily frequency, seasonality=7 seems to be the right choice.
Of course this does not address your question about modeling the second seasonality you have discovered in your data.
I certainly understand your concern about proprietary data, but without seeing the data my advise has to stick to conceptual ideas only.
When you say that "there is an almost clockwork-like increase of values from the previous years.", would you describe this pattern as a monthly cycle or a weekly cycle - or do you see level shifts across several years?
What I'm getting at is the fact the you should be able to define either discrete events such as calendar events Jan-Dec or Week1-Week52 to model this effect. Alternatively you may want to introduce an adjustment variable which mimics the level shifts. Yet another approach might be to model your data in a 2 step manner: model on daily frequency, model on monthly frequency, and then reconcile both forecasts using the HPFTEMPRECONCILE procedure.
Hope this is useful.
Thanks,
Udo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.