Obsidian | Level 7

## Problem with multiple random variables and over dispersion

Hi

I need some help with defining the RANDOM variable in PROC GLIMMIX procedure. I was running an experiment with ostriches from day old until maturity to assess the egg production of birds that were raised under different rearing practices (Standard and intensive care) for three months after hatch. Actually I want to see if birds raised under these different methods differ in egg production at maturity. The data that I have is for two breeding season using similar birds (females) rotating them in different camps and with a different male. In my analysis I have year, group (rearing method), and genotype of females as fixed effects and I want to use maleid (SIREID) male age(MAGE), female age(**bleep**E), camp, female ID(DAMID) as random variables. I am not so sure if what I am running is reliable especially when it comes to defining the random variables since I am getting a very huge CHI/DF value which indicates the model fit and it is supposed to be closer to 1 to avoid over dispersion. Below is the model that I am trying to run and the data is attached also.

PROC GLIMMIX DATA= PRODUCTION;
CLASS YEAR CAMP MAGE **bleep**E DAMID TREATMENT;
MODEL EGGS= YEAR TREATMENT YEAR*TREATMENT/ SOLUTION DDFM=RES;
RANDOM CAMP;
RANDOM SIREID;
RANDOM MAGE;
RANDOM **bleep**E;
RANDOM RESIDUAL/ SUBJECT=DAMID (YEAR TREATMENT);
RUN; QUIT;

Any help will be appreciated. Thank you so much.

Rhodochrosite | Level 12

## Re: Problem with multiple random variables and over dispersion

The answer to your question is that by default, this model assumes a normal distribution, and the Pearson Chi-Square/DF is the residual variance; it is not a measure of overdispersion. For a normal distribution, there is no such thing as overdispersion. (You probably are thinking about a Poisson distribution.)

Is the attached dataset the entire collection, or do you have more observations?

If these 18 are all you have, then your model is attempting to estimate way too many parameters, regardless of which distribution is used. The data don't conform very well to your description. The code you give will not run with the dataset you attach. YEAR, MAGE, and FEMALEAGE are totally confounded (tell me which YEAR, and I can tell you which MAGE and FEMALEAGE without error). With one exception, CAMP and FEMALEID are confounded. There are 3 levels of female genotypes, but two levels have only n=1. MALEID=1130273 has data for both GROUPs; all other birds have data for only one GROUP. Age is not a random effect. It is wise to put random effects factors in the CLASS statement.

In summary, you likely will need to completely rethink the analysis of these data.

Discussion stats