10-30-2012 02:25 PM
I am running a generalized linear model with a Poisson log link function (estimating expected claim frequency using claim counts as my dependent variable). I have, say, 20 predictor variables which are all categorical in nature. Our industry (insurance) tends to use these sorts of models a lot, but I need some guidance as to 1) how best to determine model fit and 2) how to compare one run to another for which is "better."
Here's what I do now:
1. Attain convergence.
2. Examine the Scaled Deviance divided by its degrees of freedom. I've heard that values close to 1.0 are desirable. What does it mean if the value is below 1.0? Overdispersion of the data? (What does that mean?) My last runs yielded values between 0.05-0.22. My typical application can have 1.25 million observations, so looking at the GENMOD model fit table doesn't tell me much--the numbers are basically off the charts (in a good direction) based on the number of observations (and thus, df) I'm modeling.
3. Look at the AICC, knowing that "smaller is better." Typically I'm trying to compare AICC's from one run to another. I have observed very subtle differences (say, Model 1 has an AICC of 72,305 and Model 2 has an AICC of 72,320). Is this a meaningful difference? My intuition says not.
4. Use of residuals. I know the classic literature on GLMs says to always examine your residuals. I once tried it but found that due to my numbers any meaningful conclusions were difficult. Would it make sense to take a random sample of the residuals and examine those?
5. Use of the ASSESS statement. Tried it. Sounded intriguing. Got lost. Couldn't understand the output.
I'd appreciate any information from some of the more seasoned GENMOD users/modelers out there.
Thank you so much.
10-31-2012 08:41 AM
I'll answer what I can (and that's not much). Overdispersion is when the residual variability is greater than the distributionally assumed variance. This is pretty common in trying to fit Poisson models, under which the variance and the mean are assumed to be equal. Overdispersion means that the variance is greater than the mean, and fairly substantially based on your numbers. This implies (at least to me) that the Poisson distribution may not be your best choice. The first place to look would be at a negative binomial distribution. (Actually, the zeroth place to look would be at a plot of your data to see if it is zero-inflated.)
I also agree that a 0.02% change in the AICc probably doesn't indicate a substantially better fit, so you need to depend on business sense/rules in deciding on models when this occurs. As far as the ASSESS statement, I would run screaming if I was trying to assess 20 predictors--it's a rare day when I can visualize what is happening in a four-dimensional space, let alone a twenty dimensional space that you need to build in your head to compare across the predictors.
Expert knowledge and understanding of the predictors is the best I can offer at this point.
11-02-2012 12:35 PM
According to Burnham, K. P., and Anderson, D.R. (2004), "Multimodel inference: understanding AIC and BIC in Model Selection", Sociological Methods and Research, 33: 261-304:
"The delta_i are easy to interpret and allow a quick strength-of-evidence comparison and ranking of candidate hypotheses or models. The larger the delta_i , the less plausible is ﬁtted model i as being the best approximating model in the candidate set. It is generally important to know which model (hypothesis) is second best (the ranking), as well as some measure of its standing with respect to the best model. Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having delta_i ≤ 2 have substantial support (evidence), those in which 4 ≤ delta_i ≤ 7 have considerably less support, and models having delta_i > 10 have essentially no support. These rough guidelines have similar counterparts in the Bayesian literature (Raftery 1996).
Naive users often question the importance of a delta_i = 10 when the two AIC values might be, for example, 280,000 and 280,010. The difference of 10 here might seem trivial. In fact, large AIC values contain large scaling constants, while the delta_i are free of such constants. Only these differences in AIC are interpretable as to the strength of evidence."
delta_i = AIC_i − AIC_min; AIC_min is the value for the best model. Although Burnham and Anderson use "AIC" here, they are proponents of AICc in general.
So for your point 3, delta_i = 72,320 - 72,305 = 15 implies that Model 2 is less plausible than Model 1.
Also well worth reading is
Anderson, DR & Burnham, KP (2002), "Avoiding pitfalls when using information-theoretic methods", Journal of Wildlife Management 66(3): 912-918.
Hope this is useful,
11-05-2012 08:09 AM
/*Bangs head agains desk*/
Thanks Susan, for pointing out that large AIC values contain large scaling constants. I knew that, but got lost in the relative change. This is a case where the absolute change (delta_i) is what really counts. I should have known this, as we use it all the time for selecting covariance structures. The same reasoning applies to fixed effects.
11-19-2012 03:52 PM
Thank you for the information. This is most useful. I'd like some clarification on the scaling factor in the AIC. I'm not a mathematician, and I fail to see it when looking at the formula.