BookmarkSubscribeRSS Feed
LionelLeston
Calcite | Level 5

I am working for my PhD advisor on a post-doc, and as part of my duties, I am answering the statistics questions of my advisor's other students while my advisor is on sabbattical. One of the students is using generalized linear modeling (PROC GENMOD) to test for the best distribution for modeling data (normal, Poisson, negative binomial) prior to running a mixed model using that distribution and the same predictors in PROC GLIMMIX. In the past, my advisor has suggested using AIC and/or the deviance/df of a model to decide, preferring the lowest value of AIC and deviance/df closest to 1.0. Recently, my advisor has asked us to use graphical diagnostics of residuals in PROC GENMOD, just as we would when testing if data satisfy the statistical assumptions of ordinary least-squares regression in PROC REG.

The student knows how to graph residuals from generalized linear models in both PROC GENMOD and PROC UNIVARIATE after exporting model results to an output data set. Each time the student runs the same generalized linear model under a different distribution and creates a new data set, the residuals in each data set are different, although sometimes only slightly so. But when the residuals in each data set are graphed, the graphs (e.g. histograms, Q-Q plots) are identical for each set of residuals, although each set of residuals has different basic statistics. If the basic statistics do not differ that much, how likely is it that diagnostic graphs would look identical?

Furthermore, when the student tries this in PROC GLIMMIX instead of PROC GENMOD, (i.e. runs the same mixed model three times, each time under a different distribution), then runs graphical diagnostics on the residuals from each model, then she does get different Q-Q plots for each distribution. I suppose the difference could be in the mixed models the student is accounting for random effects due to repeated measurements from the same sites, but why do we see identical graphs when we don't include random effects in PROC GENMOD?

I have included the data and program as attachments.

2 REPLIES 2
SteveDenham
Jade | Level 19

The only thing I can offer here is to look at the plots of the Pearson residuals, rather than the raw residuals.  Scaling by the proper standard error for the different distributions may make the difference more apparent.

(No guarantees)

Steve Denham


SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 801 views
  • 0 likes
  • 2 in conversation