Hello, I am running a generalized linear model with a Poisson log link function (estimating expected claim frequency using claim counts as my dependent variable). I have, say, 20 predictor variables which are all categorical in nature. Our industry (insurance) tends to use these sorts of models a lot, but I need some guidance as to 1) how best to determine model fit and 2) how to compare one run to another for which is "better." Here's what I do now: 1. Attain convergence. 2. Examine the Scaled Deviance divided by its degrees of freedom. I've heard that values close to 1.0 are desirable. What does it mean if the value is below 1.0? Overdispersion of the data? (What does that mean?) My last runs yielded values between 0.05-0.22. My typical application can have 1.25 million observations, so looking at the GENMOD model fit table doesn't tell me much--the numbers are basically off the charts (in a good direction) based on the number of observations (and thus, df) I'm modeling. 3. Look at the AICC, knowing that "smaller is better." Typically I'm trying to compare AICC's from one run to another. I have observed very subtle differences (say, Model 1 has an AICC of 72,305 and Model 2 has an AICC of 72,320). Is this a meaningful difference? My intuition says not. 4. Use of residuals. I know the classic literature on GLMs says to always examine your residuals. I once tried it but found that due to my numbers any meaningful conclusions were difficult. Would it make sense to take a random sample of the residuals and examine those? 5. Use of the ASSESS statement. Tried it. Sounded intriguing. Got lost. Couldn't understand the output. I'd appreciate any information from some of the more seasoned GENMOD users/modelers out there. Thank you so much. Marty J.
... View more