I am forecasting time series data with hpfdiagnose and I'm running into a problem with the size of my holdout sample. The code runs quickly (<20 sec) when my holdout sample is 5-9% of data, but the forecasting starts to take incredibly long when I increase the size of my holdout sample to higher values (e.g., 10-20%).
What do you think exaplains this non-linear relationship between sample size and running time? Is there something that I can do about it?
Thank you for your response. Unfortunately, my data is proprietary so I cannot share it. However, I have found the following.
My time series data is based on daily data. When seasonality=365, the size of the holdout sample has a large impact on how long the process takes. When seasonality=7, this no longer occurs. Why I had originally set seasonality =365 is because my data has this interesting pattern where every year there is an almost clockwork-like increase of values from the previous years. In other words, the pattern of data within a year are almost perfectly replicated the following year, but their absolute values are all higher relative to the previous year. I found that by setting seasonality=365, I was able to forecast this yearly step-wise function. With seasonality=7, it doesn't work.
Do you have any suggestions for how I could model seasonality=7 and still get the yearly jumps? I've tried adding a year regressor, but this doesn't seem to be doing the trick.
Hello Juan Manuel -
Since your data is on daily frequency, seasonality=7 seems to be the right choice.
Of course this does not address your question about modeling the second seasonality you have discovered in your data.
I certainly understand your concern about proprietary data, but without seeing the data my advise has to stick to conceptual ideas only.
When you say that "there is an almost clockwork-like increase of values from the previous years.", would you describe this pattern as a monthly cycle or a weekly cycle - or do you see level shifts across several years?
What I'm getting at is the fact the you should be able to define either discrete events such as calendar events Jan-Dec or Week1-Week52 to model this effect. Alternatively you may want to introduce an adjustment variable which mimics the level shifts. Yet another approach might be to model your data in a 2 step manner: model on daily frequency, model on monthly frequency, and then reconcile both forecasts using the HPFTEMPRECONCILE procedure.
Hope this is useful.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.