05-23-2017 01:45 PM
In an effort to improve a long-established manual (i.e. not statistically modeled) method of generating a single-year forecast of school district enrollment from six years worth of data, I settled upon a square-root regression random coefficients model using PROC GLIMMIX (residual analysis looks more "Normal" than a Poisson or Negative Binomial regression which I looked at first given that this is count data). Of course, I would have preferred to use more years and a procedure tailored to time-series (such as AUTOREG) but given I am trying to see if a statistically based approach will give more accurate results than the "hand-calculated" method using the same data. Consequently, I wanted to have a method for forecasting the next year's school district enrollment for each district in the state based on the intercept and time trends coefficients (where I would substitute the incremented time value into the regression equation to generate the estimate) but I am having difficulty properly retransforming the output from the square root regression.
I am following Anscombe's recommendation of a variance-stabilizing square root transformation [i.e. (y + 0.375)**(1/2)] which in turn requires a retransformation via MU**2 + SIGMA**2 - 0.375 [i.e. Y**2 + Mean Sq Error - Constant]. My issue is detemining what is the proper way to retransform the estimates given I have a few time trend variables (Year, YearSquared, YearCubed) to account for non-linear trends. Summing the mean estimates for both fixed and random effects intercept and time trend variables (and then squaring) appears to be correct, but what about the Mean Squared Error--do I add the Standard Error of the Prediction for both fixed and random components (and then square ), or do I select one or the other (and then square)? Since the retransformation requires I add the MSE back in, do I do this to each variable in turn (implying that I square before adding) or do I sum the MSE and then square (do you add MSEs like estimates? It seems wrong...).
Of course, if someone knows a way of using the built-in capabilities in GLIMMIX to make estimates for a forecast (e.g. the ESTIMATE or another option) I would be more than happy letting SAS do the work! ;-)
Thanks in advance!
05-23-2017 01:52 PM
I would actually start with your area's birth cohort: Children born 5 years prior = current kindergarten class.
You don't say whether you are looking for a single grade or total enrollment but I would look very closely at last years grades 1 through 11 + that birth cohort for this years estimate and see how well that actually matches your current year's enrollment.
You very likely could get a total birth cohort count from your vital records bureau for births to parents in a list of zip codes your district serves.