10-14-2014 06:05 PM
Updated with a real example:
I have a peculiar problem, I have ~1000 series that I have to forecast. The issue is each series has different start and end dates. I need to forecast 12 months from the end date. What is surprising to me is that when I run SAS forecast server, each series starts with the oldest date in all of the series and ends with a newest date of all the series.
As an example lest suppose if we have 3 series (a,b and c) with different start and end dates as shown below:
When I input this to SAS forecast server and set the forecast horizon to 12 months. SAS forecast server does not forecast the individual series (I deselected forecast hierarchy).
As an example My Series b, produces 36 months of forecast !!! See below. The forecast should have started in Jan 59 and ended in Dec 59 but as shown below the forecast ends in Dec 1961 which is the forecast end date of series c. How do I enforce SAS to forecast only 12 months and only forecast for end period in that series only?
Below are my questions:
I figured out the missing values can be deleted by setting the following options. I'm not sure if there is a better version to do this ? Also attached is the csv files for replication.
Thanks so much
10-16-2014 04:00 PM
SAS Forecast Studio works under the assumption that you want to create future forecasts and as such it assumes similar end dates for all of your series.
In order to accomplish this task the generated code of SAS Forecast Studio will use the "horizonstart" option to determine the maximum end date for all series. This is why your series b gets "extrapolated" first to match with the end date of a and c.
If you are comfortable with switching to SAS Forecast Server procedure code you can circumvent this behavior.
data WORK.have ;
infile 'yourpath\have.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DATE mmddyy10. ;
informat series $1. ;
informat value best32. ;
format DATE mmddyy10. ;
format series $1. ;
format value best12. ;
proc hpfengine data=have outfor=work.want out=_null_ plot=forecasts;
id date interval=month;
This will result in the following forecasts:
10-24-2014 10:02 AM
We have the scenario of different end dates, and simply cater to the longest horizon then discard the unwanted periods. The differing start dates is trickier to use Forecast Studio with. It wants everything to have the same start date in the same project. As Udo suggested, going straight to the forecast procedures is an option. If you have a strong desire to utilize Forecast Studio, then the only method I can think of is to have fake time periods.
This would require some pre-processing of the data to make all the history end in the same period, call it period 0. You would need to store a reference dataset outside of Forecast Studio that detailed what period 0 was for each time series. As long as the leading values are missing (before history is available), then those periods should be safely ignored by the model. You would still need to forecast to the longest horizon as I described above.
This approach could prove confusing, but workable as long as you are not reconciling to the hierarchy or have different time intervals.
Sounds like a good challenge.