Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- correcting/ adjusting for overdispersion and underdispersion

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-26-2019 09:35 AM
(1569 views)

Hello,

I am wondering how to correct for over - and underdisperion in glimmix. Could someone help me. Thanks!

Proc glimmix data = doc1;

ID Idn;

class diet strain;

model Thick = diet|strain / DDFM = KENWARDROGER;

random residual / subject = pen;

lsmeans diet|strain;

run;

The output is often:

Generalized Chi-SquareGener. Chi-Square / DF

454.29 |

8.11 |

or

Generalized Chi-SquareGener. Chi-Square / DF

6.44 |

0.11 |

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So, for my understanding. In this case the Chi-square/df does not really say something with this model.

The opposite is true. It gives you some measure of dispersion of the data around the fitted line.

--

Paige Miller

Paige Miller

16 REPLIES 16

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Does this help?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I tried it as the statement:

Proc glimmix data = Anu.organday1_3;

ID Idn;

class diet strain;

model Gizz_Thick_rel = diet|strain / DDFM = KENWARDROGER;

random residual / subject = pen;

random_residual_ = Gizz_thick_rel;

lsmeans diet|strain;

run;

However, it stays 8.11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you please show the entire output from PROC GLIMMIX instead of just a few lines?

Please click on the {i} icon and paste the output, as text, into the window that appears. **Do not skip this step.**

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Proc glimmix data = Anu.organweightday123_3; ID Idn; class diet strain; model Pancreas_rel = diet|strain / DDFM = KENWARDROGER; random residual / subject = pen; lsmeans diet|strain; run;

The SAS System The GLIMMIX Procedure Model Information Data Set ANU.ORGANWEIGHTDAY123_3 Response Variable Pancreas_rel Response Distribution Gaussian Link Function Identity Variance Function Default Variance Matrix Blocked By Pen Estimation Technique Restricted Maximum Likelihood Degrees of Freedom Method Kenward-Roger Fixed Effects SE Adjustment Kenward-Roger Class Level Information Class Levels Values diet 2 1 2 strain 2 A B Number of Observations Read 196 Number of Observations Used 60 Dimensions R-side Cov. Parameters 1 Columns in X 9 Columns in Z per Subject 0 Subjects (Blocks in V) 30 Max Obs per Subject 3 Optimization Information Optimization Technique None Parameters 0 Lower Boundaries 0 Upper Boundaries 0 Fixed Effects Profiled Residual Variance Profiled Starting From Data Fit Statistics -2 Res Log Likelihood 280.34 AIC (smaller is better) 282.34 AICC (smaller is better) 282.42 BIC (smaller is better) 283.74 CAIC (smaller is better) 284.74 HQIC (smaller is better) 282.79 Generalized Chi-Square 403.67 Gener. Chi-Square / DF 7.21 Covariance Parameter Estimates Cov Parm Estimate Standard Error Residual (VC) 7.2084 1.3623 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F diet 1 56 0.00 0.9502 strain 1 56 1.49 0.2266 diet*strain 1 56 1.02 0.3177 diet Least Squares Means diet Estimate Standard Error DF t Value Pr > |t| 1 3.4287 0.4784 56 7.17 <.0001 2 3.4724 0.5074 56 6.84 <.0001 strain Least Squares Means strain Estimate Standard Error DF t Value Pr > |t| A 3.0242 0.5074 56 5.96 <.0001 B 3.8768 0.4784 56 8.10 <.0001 diet*strain Least Squares Means strain diet Estimate Standard Error DF t Value Pr > |t| A 1 2.6508 0.7176 56 3.69 0.0005 B 1 4.2065 0.6328 56 6.65 <.0001 A 2 3.3976 0.7176 56 4.73 <.0001 B 2 3.5472 0.7176 56 4.94 <.0001

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hmmm ... okay, are you saying that this represents over-dispersion? Why do you say that?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As you ponder your answer to @PaigeMiller 's question, you can keep in mind that overdispersion is not an issue for the Gaussian distribution (which is what your model assumes). I think Paige may be trying to make you think about your understanding of the model and the output, and I am, maybe, giving you a hint.

We do consider the possibility of overdispersion for one-parameter distributions in the exponential family, like binomial and Poisson.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Well after reading of different articles and of watching different YouTube movies, I thought that the generalized chii square/df should be near 1.

And I showed you in this case a different response variable, indeed one with Gaussian distribution.

So, I have different values of genre chi square/df, but with the same model and all not near one.

But you are saying that with the Gaussian distribution, this value is not a problem?

And I showed you in this case a different response variable, indeed one with Gaussian distribution.

So, I have different values of genre chi square/df, but with the same model and all not near one.

But you are saying that with the Gaussian distribution, this value is not a problem?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, for Gaussian distribution (which is the default fit by GLIMMIX), there is no such thing as overdispersion. That doesn't mean the model fits well, there can be other problems, but not overdispersion.

You see Gaussian distributions are fit in such a way that the mean and the variance are estimated from the data. In other distributions, such as the Poisson or exponential, the variance is known before the model fit, and when the variance is estimated from the model fit is not close to the known variance, then you have underdispersion or overdispersion (example: if you have a Poisson distribution, the variance must be equal to the mean).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for the explanation.

However, the question arises. What are the other problems or is it better to use an entire different model, like proc mixed.

However, the question arises. What are the other problems or is it better to use an entire different model, like proc mixed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As far as I know, for the Gaussian case, PROC MIXED and PROC GLIMMIX should produce the same results for the same model.

Other problems: poor model fit, which can happen even in non-Gaussian cases with no overdispersion.

But its not clear why you don't like this model that you have fit, what is wrong with it?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So, for my understanding. In this case the Chi-square/df does not really say something with this model. However, if you would use the poisson distribution. Like for instance this example. Only in this case, the chi-square/df says something? Is there an article about this?

The SAS System The GLIMMIX Procedure Model Information Data Set ANU.ORGANDAY1_3 Response Variable Progizz_score Response Distribution Poisson Link Function Log Variance Function Default Variance Matrix Blocked By Pen Estimation Technique Residual PL Degrees of Freedom Method Kenward-Roger Fixed Effects SE Adjustment Kenward-Roger Class Level Information Class Levels Values diet 2 1 2 strain 2 A B Number of Observations Read 60 Number of Observations Used 60 Dimensions R-side Cov. Parameters 1 Columns in X 9 Columns in Z per Subject 0 Subjects (Blocks in V) 30 Max Obs per Subject 3 Optimization Information Optimization Technique None Parameters 0 Lower Boundaries 0 Upper Boundaries 0 Fixed Effects Profiled Residual Variance Profiled Starting From Data Iteration History Iteration Restarts Subiterations Objective Function Change Max Gradient 0 0 0 -12.50422765 0.30362060 . 1 0 0 -9.527888119 0.00779297 . 2 0 0 -9.480642908 0.00000395 . 3 0 0 -9.480626779 0.00000000 . Convergence criterion (PCONV=1.11022E-8) satisfied. Fit Statistics -2 Res Log Pseudo-Likelihood -9.48 Generalized Chi-Square 7.82 Gener. Chi-Square / DF 0.14 Covariance Parameter Estimates Cov Parm Estimate Standard Error Residual (VC) 0.1397 0.02639 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F diet 1 56 3.88 0.0539 strain 1 56 1.64 0.2059 diet*strain 1 56 0.81 0.3705 diet Least Squares Means diet Estimate Standard Error DF t Value Pr > |t| 1 1.1795 0.03708 56 31.81 <.0001 2 1.2829 0.03719 56 34.50 <.0001 strain Least Squares Means strain Estimate Standard Error DF t Value Pr > |t| A 1.1976 0.03886 56 30.82 <.0001 B 1.2648 0.03532 56 35.81 <.0001 diet*strain Least Squares Means strain diet Estimate Standard Error DF t Value Pr > |t| A 1 1.1221 0.05699 56 19.69 <.0001 B 1 1.2368 0.04746 56 26.06 <.0001 A 2 1.2730 0.05285 56 24.09 <.0001 B 2 1.2928 0.05233 56 24.70 <.0001

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Maybe you could answer my questions that you haven't yet answered. Specifically:

"are you saying that this represents over-dispersion? Why do you say that?"

"its not clear why you don't like this model that you have fit, what is wrong with it?"

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Maybe just a misunderstanding, because i thought with the other post that you were saying that there was no overdispersion, but maybe that are different problems. So, I thought you meant that the model did not look good.

I am content with the model, but I was just confused by the high value of the chi square/df

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.