Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Assessing fit: Deviance and scaled deviance

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-14-2013 04:01 PM
(6351 views)

Hello,

I was wondering if anyone could clarify the use of deviance and/or scaled deviance to assess model fit (in Proc Genmod).

I understand that a deviance/df value that is much greater or smaller than 1 could indicate over or underdispersion of the response variable (or model misspecification). This could lead to incorrect estimation of the standard error of parameters, and thus misinterpreation of thir statistical significance. However, it is also my understanding that with the use of certain error distributions, such as the gamma or negative binomial, a scale parameter is estimated (I am assuming to model this over/underdispersion?). In these cases, is this parameter used to correct for the effect of over/underdispersion on error estimates, i.e. does SAS' output give estimates of parameter SE that are corrected for over/underdispersion? How does this translate to affect "LR Statistics For Type 3 Analysis", if it does at all?

Along similar lines, if the scale parameter is used to explicitly model dispersion, I presume scaled deviance/df can be used to assess model fit rather than deviance/df?

I am asking these questions in the context of the following example, which left me unsure of the appropriate "next step":

Variables:

-Highly right-skewed continuous response variable that becomes normally distributed when log transformed

-1 continuous independent variable, a couple categorical independent variables, select interaction terms

Modelling Approaches:

-general linear model approach with with log-trasformed response variable had dev/df=3.7 (which I thought was odd given the normally-distributed response variable, and the fact that the residuals were also normally distributed)

-generalized linear model approach with log link function and gamma error distribution had dev/df=2.9, scaled dev/df=1.3, scale=0.45

The reported dev/df value made me question the validity of the chosen error distribution and link function, but I couldn't think of more appropriate ones given the circumstances. I am not sure where to "go" next, I hope this doesn't warrant nonparametric analysis!!!! However, if the scaled dev/df value is the appropriate term to use to assess model fit, I think I am okay...

Any clarification on any of the above questions/musings would be great!!

Thanks!

Madison

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Others may have different ideas, but I think you are on the right track with the gamma distribution. That scaled dev/df is saying that you are only slightly overdispersed, and in this case, it is almost certainly due to a systematic variable that was not included in the model, such that it "stretches" the distribution of residuals. You may not even have access to this variable, so in the context of what is going on, you are probably OK.

You mention "select interaction terms." Are there interactions that were deleted from the model due to nonsignificance? What happens to the scaled dev/df if these are retained in the model? It may be that these define some extreme measurements, but that there is insufficient power to declare the interaction "significant."

I will be interested in the outcome of this analysis.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the reply, Steve.

Glad to hear that what I've done seems reasonable!

The interaction terms I mentioned refer to select interactions between variables that I was intersested in. The full model didn't include all possible interactions, but the nonsignificant interaction terms were removed sequentially using backwards elimination (most variables and interactions were retained, however). The dev/df ratio did not change much at all in eliminating these terms. I imagine that there are indeed other important explanatory variables "out there" that were not considered in the analysis, which could explain problems with fit.

Madison

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your reply John,

How interesting, the deviance/df using the normal distribtion is indeed equal to the residual variance. Perhaps this is just a mathematical fact, it would be interesting to compare the calculation of the two (but I'm not that ambitious!! haha)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I believe SAS calculates the residual error via the deviance/df value for the normal distribution.

Regardless, I don't believe you can have overdispersion on the normal distribution because the variance is not a function of the mean...it can be anything. That's why your residuals look normal...they can be normal with any residual error = deviance/df. The other distributions have relationships between the two where you are essentially checking the distributional assumption with the deviance/df value. For instance, the gamma variance is assumed to scale with the square of the mean, poisson variance scales with the mean, etc.

Excluding the complication of weights, for the normal distribution, variance = the scale parameter (times 1). For gamma, variance = scale * mean^2. For poisson, variance = scale * mean. So if you divide both sides by the mean, and the assumed relationship holds, the scale value is 1.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Of course! That makes a lot of sense.

So then does the scale parameter essentially add another term to the assumed relationship between the variance and mean of the distribtuion?

Thanks John!

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.