turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc GLIMMIX - binomial data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-09-2015 03:58 PM

For binomial data with fixed and random effects, I have been using Proc Glimmix with the *events/trials* syntax, e.g.,

class block trt;

model events/trials = trt/ solution ddfm=Satherth;

random block/ group= block*trt;

lsmeans trt/ adjust=tukey;

However, I am wondering what the difference is from this syntax (difference bolded):

class block trt;

**model state = trt/ **solution ddfm=Satherth** dist=binomial**;

random block/ group= block*trt;

lsmeans trt/ adjust=tukey;

where the state variable is a 1 or 0 in the dataset.

Using the same data, I have tried both ways and get different output, specifically with regards to estimations of differences in Least Squares Means (esp. degrees of freedom) and the subsequent means separation letters. There are also differences in the co-variance parameter estimates. Why is this, and most importantly, how do I determine which syntax is most appropriate?

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-12-2015 08:38 AM

Well, the difference is from using the dist=binomial, when in actuality the data are Bernoulli (and should probably be fit with **dist=binary). **I know it looks the same, but I've kind of found out the hard way that this makes a fairly substantial difference. The events/trials syntax is a lot more robust to the algorithms used, as the design matrix is not nearly as sparse (I think that is the reason).

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-12-2015 11:36 AM

I see, yes that makes sense, thank you! What about the degrees of freedom (den) -- events/trials has 6, whereas dist=binary (with 0s and 1s as response) has 206 -- should that factor into choosing which syntax to use (and why or why not)?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-12-2015 12:18 PM

Steve gives good advice, but I will add some comments. It shouldn't matter if you put binomial or binary, since the Bernoulli is a special case of binomial when n=1. I don't understand why you would use

random **block** / sub=block*trt;

Having block on both sides of / does not make sense. Are you trying to get separate random effects for block and for block*trt? If so, then you would use

random int trt / sub=block;

or

random block block*trt;

Use of the subject syntax is more efficient computationally. If you just want block*trt, then use

random int / sub=block*trt;

or

random block*trt;

There is another difference between the two approaches (events/trial vs event [0,1]). You must make sure that you are properly accounting for the experimental units. If treatments are applied to groups of individuals (say, plants in a pot), and pots assigned to different treatments is random, then the pot is the experimental unit, not the plant. Your first syntax is appropriate then (with my corrections), with the smaller denominator df. Your second syntax would be incorrectly considering each individual (say, plant) as an experimental unit, with plants (not groups of plants) randomly assigned to treatments.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-12-2015 01:24 PM

Thanks for the added input!

Perhaps some additional design details for clarification: I fed groups of animals 5 different things. Then I measured various physiological responses of individuals. I had been considering each individual as the experimental unit because in some cases, the physiological measures (e.g. response under an additional stress) are applied individually. This experiment was repeated 3 times (blocked through time), though not all of the 5 diets were available in each block (so an incomplete block design). A Levene's test indicated the block*diet effect to have heterogeneous variances, so I followed the suggestion in the Littel et al. book (pg 350) to account for this by using *group = block*trt* to specify a different residual variance for each block*trt combo.

But I am very new to mixed modelling. The Littel book example uses *group = block*trt* in a repeated measures situation in Proc Mixed (which mine is not), so maybe it's not appropriate?

Also, why do you include the "int" variable (intercept?) in the syntax you suggested?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-12-2015 01:56 PM

Sorry, I accidentally read "group=" as "subject=". My fault. But this syntax would be more problematical, in my view. You can't model a separate block variance for each combination of treatment and block. Levene's test is continuous data, not binary or binomial. Page 350 of Littell is not about blocks. Syntax comes down to whether the "groups of animals" within the block is the experimental unit or the single animal within the block. Let's assume it is the individual animal. You need a random block effect, and because there is no scale parameter for a Bernoulli distribution, you probably should have a trt*block random effect. Stroup writes a lot about this in his generalized linear mixed modeling book. I would use:

random int trt / sub=block;

equivalent to:

random block block*trt;

The subject= convention is more efficient. "Int" is the intercept, basically a column of 1s. So, the first statement is giving you both block and block*trt.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2015 12:30 PM

This is very helpful, thank you.

If I had been using continuous data (which I do in other analyses) and I had unequal trt variances, still with random effect of block, do you still recommend against the *group=* syntax?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2015 01:12 PM

I have had pretty good luck with the group= syntax with continuous variables, both as a G side and as an R side factor. Be aware that the use of group= greatly increases the number of parameters estimated, so be sure to have a pretty robust dataset before using it. In my case, I have used:

random int/subject=block group=trt;

This generates a variance component due to block for each treatment. Note that this is subtly different from generating a block by treatment variance component.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2015 01:44 PM

So the group= variable works within the variable(s) defined in the subject= statement?

And would

random block/ group=trt;

only generate a variance component for each treatment (regardless of the random effect of block)?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2015 03:06 PM

The syntax from Steve is giving a separate block variance for each treatment. For binomial or binary data, there is no residual variance.