06-11-2014 08:47 PM
I could use some guidance in selecting the most appropriate forecasting or prediction procedure given the following data, goals, and constraints:
Very large number of observations
Monthly data per person (Feb thru Dec in year 1; Jan thru Mar in year 2);
One continuous dependent variable
Use Year 1 data to create a model
Apply early Year 2 data to predict December of Year 2
The monthly DV observations are not linear over time (simple downward slope - quadratic?)
My early attempts to predict have not been fruitful.
I have tried PROC LOESS, FORECAST, ARIMA, X12, and TRANSREG.
One of the above may be right, but the constrains of such large data mean long processing times and (commonly) insufficient memory.
I'd appreciate any guidance regarding the most suitable method so I can subset the data and try again.
06-12-2014 05:14 PM
No, I'll give PROC ESM a try as well.
At first glance, I don't see a SCORE option to predict on the Year 2 Jan-Mar values.
Am I missing something?
06-16-2014 11:21 AM
The ESM procedure uses a lead= option rather than a score statement. you can see the syntax here.
Let us know if you need any help. -Ken
06-17-2014 04:13 PM
If you are considering using time series techniques such as exponential smoothing, then your idea of: "Use Year 1 data to create a model Apply early Year 2 data to predict December of Year 2" will not work.
Time series models are closely tied to the data which is used to estimate parameters. This is very different to techniques like OLS regression.
For example: you can "train" a predictive model such as a logistic regression on training data, create a score file, and then apply this score file to new data.
This concept does not apply to statistical forecasting models - here you should use all history available to estimate the parameters of the model - usually the most recent data is the most relevant. Also, once you have estimated the parameters, these models are usually tied to the history which was used for estimation. For exponential smoothing models for example you can think of floating average with infinite memory but with exponentially falling weights.
In my opinion the question of whether to use a predictive model or a statistical forecasting model depends on your business question which you have in mind - note that for both areas very scalable algorithms are available.
If you can share a some example data and specify how you would the results to look like, we might be able to come up with some code snippet for you.
07-02-2014 06:18 PM
Thank you for the thorough response.
Here is a summarized version of what I have available:
To make matters more complicated, Var1 is a rolling YTD average.
Does the "Accumulate=average" subcommand account for the heavier weight toward year's end?
07-07-2014 02:15 PM
Many thanks for sharing an example - I don't think that statistical forecasting techniques such as exponential smoothing will be applicable for your situation - due to the lack of history. After plotting your data I was thinking that you might be better off using a curve fitting technique such as LOESS to your data and try to come up with a "profile" which can be applied to future points. However, again the lack of historic data will be an issue, unless you assume that 2013 is a strong representative of 2014.
This e-newsletter might give you some ideas: http://support.sas.com/community/newsletters/training/forecasting.html
Hope this makes sense.