Not applicable
Posts: 1

# Problem with convergence in glimmix for binary data with a zero mean

Hello,

My issue is that I am having the darndest time getting my binomial model to converge in glimmix. It's a pretty simple analysis, and I think the fix should be simple but I just can't figure it out. The problem seems to be due to one of the means being zero, which seems to freak SAS out.

In sum, if you want to skip the details, I think I just need to tell SAS that the estimate for a certain mean has a lower bound of 0.000001 because it seems to be unable to handle the fact that the real mean is zero. How do I do this? (I tried parms/lowerb and that didn't help...maybe that's the wrong parameter?)

Here's the setup. The response is whether or not a plant survived last winter in Michigan (0/1). There's a fixed effect of latitude (where the seeds originated) with 5 levels: PANama, MIAmi, TALahassee, TENessee, and MICHigan. There's a random effect of maternal line nested within latitude. There are five maternal lines per latitude, except one latitude (PAN, which you'll see below is my problematic latitude) only has one line. There are several individuals per line (5-15). I don't think the unbalanced design is part of the problem because I've analyzed several other responses with a similar model, except that those were continuous variables so I used proc mixed with no issues.

I think the problem is that the Panama population had zero survival, across the board. If I change one of datapoints for PAN to 1, this model works perfectly fine:

proc glimmix data = survival;

class lat line indiv;

model survival = lat /link=logit dist=binomial;

random line(lat);

run;

The effect of latitude is highly significant. This is an obvious result if you look at the raw percent survival...PAN = 0%, MIA = 30%, the rest are above 90%. But, I really want to know if Panama is significantly different from Miami, and changing that one datapoint bumps Panama survival to 6% and there's no sig. diff. from Miami. If survival were zero, would the difference be significant?

But there are severe problems with analyzing the real data. The model will not converge. The only way I can figure out to get it to run is by adding parms/noiter:

proc glimmix data = survival;

class lat line indiv;

model survival = lat /link=logit dist=binomial;

random line(lat);

parms/noiter;

run;

and it gives me results, but they don't make any sense. The mean and s.e. for Panama are very, very tiny, and yet the p-values comparing Panama to any other population are near 1.

So basically I think I just need to tell it that the mean for Panama has a lower bound of 0.00001. This seems like a simple proposition, but the proper search terms are eluding me. How do I do this? Also, is the parms/noiter line valid for my analysis?

Thank you! Any help is greatly appreciated!

Posts: 2,655

## Re: Problem with convergence in glimmix for binary data with a zero mean

First off, the noiter means that the values are the initial starting values, so things are dicey when you ask 'valid'.  The results (esp. the standard errors of the lsmeans) essentially do not recognize the random effects.

What you have is quasi-separation.  For the lowest value of lat, it is all zeroes. It appears that your data are on an individual basis (0/1).  If that is the case, change the distribution to dist=binary.  The other possibility is to aggregate within line, so that a binomial (x/y) value is analyzed.

If the distribution is kept as binary, you might also have better luck if you add an NLOPTIONS statement, and set the technique to something other than the default quasi-newton.  I have had pretty good luck with ridged newton-raphson, so the line would look like:

nloptions tech=nrridg;

Still, quasi-separation is a big problem with generalized linear models, but Google is your friend, and there are a lot of sites out there with other suggestions.

Also, check the following thread: https://communities.sas.com/message/130974.  The suggestion there to group Age into buckets could be translated to group lat into buckets.  Of course, that would mean grouping Panama with something else and defeating the purpose of your analysis.  So perhaps grouping within line and using dist=binomial might help.

Steve Denham

Discussion stats