09-23-2013 05:06 PM
I am working for my PhD advisor on a post-doc, and as part of my duties, I am answering the statistics questions of my advisor's other students while my advisor is on sabbattical. One of the students is using generalized linear modeling (PROC GENMOD) to test for the best distribution for modeling data (normal, Poisson, negative binomial) prior to running a mixed model using that distribution and the same predictors in PROC GLIMMIX. In the past, my advisor has suggested using AIC and/or the deviance/df of a model to decide, preferring the lowest value of AIC and deviance/df closest to 1.0. Recently, my advisor has asked us to use graphical diagnostics of residuals in PROC GENMOD, just as we would when testing if data satisfy the statistical assumptions of ordinary least-squares regression in PROC REG.
The student knows how to graph residuals from generalized linear models in both PROC GENMOD and PROC UNIVARIATE after exporting model results to an output data set. Each time the student runs the same generalized linear model under a different distribution and creates a new data set, the residuals in each data set are different, although sometimes only slightly so. But when the residuals in each data set are graphed, the graphs (e.g. histograms, Q-Q plots) are identical for each set of residuals, although each set of residuals has different basic statistics. If the basic statistics do not differ that much, how likely is it that diagnostic graphs would look identical?
Furthermore, when the student tries this in PROC GLIMMIX instead of PROC GENMOD, (i.e. runs the same mixed model three times, each time under a different distribution), then runs graphical diagnostics on the residuals from each model, then she does get different Q-Q plots for each distribution. I suppose the difference could be in the mixed models the student is accounting for random effects due to repeated measurements from the same sites, but why do we see identical graphs when we don't include random effects in PROC GENMOD?
I have included the data and program as attachments.
09-24-2013 09:15 AM
The only thing I can offer here is to look at the plots of the Pearson residuals, rather than the raw residuals. Scaling by the proper standard error for the different distributions may make the difference more apparent.