BookmarkSubscribeRSS Feed
LBAM
Calcite | Level 5

For binomial data with fixed and random effects, I have been using Proc Glimmix with the events/trials syntax, e.g.,

class block trt;

model events/trials = trt/ solution ddfm=Satherth;

random block/ group= block*trt;

lsmeans trt/ adjust=tukey;

However, I am wondering what the difference is from this syntax (difference bolded):

class block trt;

model state = trt/ solution ddfm=Satherth dist=binomial;

random block/ group= block*trt;

lsmeans trt/ adjust=tukey;

where the state variable is a 1 or 0 in the dataset. 

Using the same data, I have tried both ways and get different output, specifically with regards to estimations of differences in Least Squares Means (esp. degrees of freedom) and the subsequent means separation letters.  There are also differences in the co-variance parameter estimates.  Why is this, and most importantly, how do I determine which syntax is most appropriate?

Thanks!

9 REPLIES 9
SteveDenham
Jade | Level 19

Well, the difference is from using the dist=binomial, when in actuality the data are Bernoulli (and should probably be fit with dist=binary).  I know it looks the same, but I've kind of found out the hard way that this makes a fairly substantial difference.  The events/trials syntax is a lot more robust to the algorithms used, as the design matrix is not nearly as sparse (I think that is the reason).

Steve Denham

LBAM
Calcite | Level 5

I see, yes that makes sense, thank you!   What about the degrees of freedom (den) --  events/trials has 6, whereas dist=binary (with 0s and 1s as response) has 206 -- should that factor into choosing which syntax to use (and why or why not)?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Steve gives good advice, but I will add some comments. It shouldn't matter if you put binomial or binary, since the Bernoulli is a special case of binomial when n=1. I don't understand why you would use

random block / sub=block*trt;

Having block on both sides of / does not make sense. Are you trying to get separate random effects for block and for block*trt? If so, then you would use

random int trt / sub=block;

or

random block block*trt;

Use of the subject syntax is more efficient computationally. If you just want block*trt, then use

random int / sub=block*trt;

or

random block*trt;

There is another difference between the two approaches (events/trial vs event [0,1]). You must make sure that you are properly accounting for the experimental units. If treatments are applied to groups of individuals (say, plants in a pot), and pots assigned to different treatments is random, then the pot is the experimental unit, not the plant. Your first syntax is appropriate then (with my corrections), with the smaller denominator df. Your second syntax would be incorrectly considering each individual (say, plant) as an experimental unit, with plants (not groups of plants) randomly assigned to treatments.

LBAM
Calcite | Level 5

Thanks for the added input!

Perhaps some additional design details for clarification:  I fed groups of animals 5 different things. Then I measured various physiological responses of individuals. I had been considering each individual as the experimental unit because in some cases, the physiological measures (e.g. response under an additional stress) are applied individually.  This experiment was repeated 3 times (blocked through time), though not all of the 5 diets were available in each block (so an incomplete block design). A Levene's test indicated the block*diet effect to have heterogeneous variances, so I followed the suggestion in the Littel et al. book (pg 350) to account for this by using group = block*trt to specify a different residual variance for each block*trt combo.

But I am very new to mixed modelling. The Littel book example uses group = block*trt in a repeated measures situation in Proc Mixed (which mine is not), so maybe it's not appropriate? 

Also, why do you include the "int" variable (intercept?) in the syntax you suggested?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Sorry, I accidentally read "group=" as "subject=". My fault. But this syntax would be more problematical, in my view. You can't model a separate block variance for each combination of treatment and block. Levene's test is continuous data, not binary or binomial. Page 350 of Littell is not about blocks. Syntax comes down to whether the "groups of animals" within the block is the experimental unit or the single animal within the block. Let's assume it is the individual animal. You need a random block effect, and because there is no scale parameter for a Bernoulli distribution, you probably should have a trt*block random effect. Stroup writes a lot about this in his generalized linear mixed modeling book. I would use:

random int trt / sub=block;

equivalent to:

random block block*trt;

The subject= convention is more efficient. "Int" is the intercept, basically a column of 1s. So, the first statement is giving you both block and block*trt.

LBAM
Calcite | Level 5

This is very helpful, thank you.

If I had been using continuous data (which I do in other analyses) and I had unequal trt variances, still with random effect of block, do you still recommend against the group=  syntax?

SteveDenham
Jade | Level 19

I have had pretty good luck with the group= syntax with continuous variables, both as a G side and as an R side factor.  Be aware that the use of group= greatly increases the number of parameters estimated, so be sure to have a pretty robust dataset before using it.  In my case, I have used:

random int/subject=block group=trt;

This generates a variance component due to block for each treatment.  Note that this is subtly different from generating a block by treatment variance component.

Steve Denham

LBAM
Calcite | Level 5

So the group= variable works within the variable(s) defined in the subject= statement?

And would

random block/ group=trt;

only generate a variance component for each treatment (regardless of the random effect of block)?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The syntax from Steve is giving a separate block variance for each treatment. For binomial or binary data, there is no residual variance.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3339 views
  • 6 likes
  • 3 in conversation