03-14-2013 04:01 PM
I was wondering if anyone could clarify the use of deviance and/or scaled deviance to assess model fit (in Proc Genmod).
I understand that a deviance/df value that is much greater or smaller than 1 could indicate over or underdispersion of the response variable (or model misspecification). This could lead to incorrect estimation of the standard error of parameters, and thus misinterpreation of thir statistical significance. However, it is also my understanding that with the use of certain error distributions, such as the gamma or negative binomial, a scale parameter is estimated (I am assuming to model this over/underdispersion?). In these cases, is this parameter used to correct for the effect of over/underdispersion on error estimates, i.e. does SAS' output give estimates of parameter SE that are corrected for over/underdispersion? How does this translate to affect "LR Statistics For Type 3 Analysis", if it does at all?
Along similar lines, if the scale parameter is used to explicitly model dispersion, I presume scaled deviance/df can be used to assess model fit rather than deviance/df?
I am asking these questions in the context of the following example, which left me unsure of the appropriate "next step":
-Highly right-skewed continuous response variable that becomes normally distributed when log transformed
-1 continuous independent variable, a couple categorical independent variables, select interaction terms
-general linear model approach with with log-trasformed response variable had dev/df=3.7 (which I thought was odd given the normally-distributed response variable, and the fact that the residuals were also normally distributed)
-generalized linear model approach with log link function and gamma error distribution had dev/df=2.9, scaled dev/df=1.3, scale=0.45
The reported dev/df value made me question the validity of the chosen error distribution and link function, but I couldn't think of more appropriate ones given the circumstances. I am not sure where to "go" next, I hope this doesn't warrant nonparametric analysis!!!! However, if the scaled dev/df value is the appropriate term to use to assess model fit, I think I am okay...
Any clarification on any of the above questions/musings would be great!!
03-15-2013 08:12 AM
Others may have different ideas, but I think you are on the right track with the gamma distribution. That scaled dev/df is saying that you are only slightly overdispersed, and in this case, it is almost certainly due to a systematic variable that was not included in the model, such that it "stretches" the distribution of residuals. You may not even have access to this variable, so in the context of what is going on, you are probably OK.
You mention "select interaction terms." Are there interactions that were deleted from the model due to nonsignificance? What happens to the scaled dev/df if these are retained in the model? It may be that these define some extreme measurements, but that there is insufficient power to declare the interaction "significant."
I will be interested in the outcome of this analysis.
03-15-2013 04:18 PM
Thanks for the reply, Steve.
Glad to hear that what I've done seems reasonable!
The interaction terms I mentioned refer to select interactions between variables that I was intersested in. The full model didn't include all possible interactions, but the nonsignificant interaction terms were removed sequentially using backwards elimination (most variables and interactions were retained, however). The dev/df ratio did not change much at all in eliminating these terms. I imagine that there are indeed other important explanatory variables "out there" that were not considered in the analysis, which could explain problems with fit.
03-15-2013 01:27 PM
I believe the dev/DF for the normal model may just be representative of your residual variance. Can you compare this value to your residual variance estimate? I had the same thing happen just this past week, which puzzled me, then I realized that mine were equal. I believe the scaled deviance / DF would be exactly one because the scaling is done by the (constant) variance...but if so, it's always one and not very helpful in the normal case.
03-15-2013 04:23 PM
Thanks for your reply John,
How interesting, the deviance/df using the normal distribtion is indeed equal to the residual variance. Perhaps this is just a mathematical fact, it would be interesting to compare the calculation of the two (but I'm not that ambitious!! haha)
03-15-2013 04:29 PM
I believe SAS calculates the residual error via the deviance/df value for the normal distribution.
Regardless, I don't believe you can have overdispersion on the normal distribution because the variance is not a function of the mean...it can be anything. That's why your residuals look normal...they can be normal with any residual error = deviance/df. The other distributions have relationships between the two where you are essentially checking the distributional assumption with the deviance/df value. For instance, the gamma variance is assumed to scale with the square of the mean, poisson variance scales with the mean, etc.
Excluding the complication of weights, for the normal distribution, variance = the scale parameter (times 1). For gamma, variance = scale * mean^2. For poisson, variance = scale * mean. So if you divide both sides by the mean, and the assumed relationship holds, the scale value is 1.
03-15-2013 04:34 PM
Of course! That makes a lot of sense.
So then does the scale parameter essentially add another term to the assumed relationship between the variance and mean of the distribtuion?