New Contributor
Posts: 2

# Size of holdout sample in forecasting

Hello,

I am forecasting time series data with hpfdiagnose and I'm running into a problem with the size of my holdout sample. The code runs quickly (<20 sec) when my holdout sample is 5-9% of data, but the forecasting starts to take incredibly long when I increase the size of my holdout sample to higher values (e.g., 10-20%).

What do you think exaplains this non-linear relationship between sample size and running time? Is there something that I can do about it?

Best,

JMC

SAS Employee
Posts: 416

## Re: Size of holdout sample in forecasting

Hello -

In a way I find your findings counterintuitive, as I would expect faster run times when increasing the holdout sample values.

Would you mind to share your code and some test data to replicate your findings?

Thanks,

Udo

New Contributor
Posts: 2

## Re: Size of holdout sample in forecasting

Hello Udo,

Thank you for your response. Unfortunately, my data is proprietary so I cannot share it. However, I have found the following.

My time series data is based on daily data. When seasonality=365, the size of the holdout sample has a large impact on how long the process takes. When seasonality=7, this no longer occurs. Why I had originally set seasonality =365 is because my data has this interesting pattern where every year there is an almost clockwork-like increase of values from the previous years. In other words, the pattern of data within a year are almost perfectly replicated the following year, but their absolute values are all higher relative to the previous year. I found that by setting seasonality=365, I was able to forecast this yearly step-wise function. With seasonality=7, it doesn't work.

Do you have any suggestions for how I could model seasonality=7 and still get the yearly jumps? I've tried adding a year regressor, but this doesn't seem to be doing the trick.

Best,

Juan Manuel

SAS Employee
Posts: 416

## Re: Size of holdout sample in forecasting

Hello Juan Manuel -

Since your data is on daily frequency, seasonality=7 seems to be the right choice.

I certainly understand your concern about proprietary data, but without seeing the data my advise has to stick to conceptual ideas only.

When you say that "there is an almost clockwork-like increase of values from the previous years.", would you describe this pattern as a monthly cycle or a weekly cycle - or do you see level shifts across several years?

What I'm getting at is the fact the you should be able to define either discrete events such as calendar events Jan-Dec or Week1-Week52 to model this effect. Alternatively you may want to introduce an adjustment variable which mimics the level shifts. Yet another approach might be to model your data in a 2 step manner: model on daily frequency, model on monthly frequency, and then reconcile both forecasts using the HPFTEMPRECONCILE procedure.

Hope this is useful.

Thanks,

Udo

Discussion stats
• 3 replies
• 676 views
• 0 likes
• 2 in conversation