BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RyanSimmons
Pyrite | Level 9

I am running a mixed-effects zero-inflated Poisson model using PROC NLMIXED. The following code fits my model without any problems:

 

 

PROC NLMIXED data=_model method=gauss;
	parms a0=-1.1433
		  b0=-0.8371 b1=0.07413 b2=0.08834 b3=-0.2387 b4=0 b5=0 b6=0 b7=0 b8=0
		  sigma=0.3030
	;

	zeromodel = a0;
	phi = 1/(1+exp(-zeromodel));

	mu = exp(b0 + subj + b1*Time + b2*Intervention + b3*Interaction 					
		    + b4*phone + b5*positive + b6*education + b7*income + b8*age	
	            + logTI);														
	if UAVI=0 then
		logLik = log(phi + (1-phi)*exp(-mu));
	else
		logLik = log(1-phi) + UAVI*log(mu) - mu - lgamma(UAVI+1);
	model UAVI ~ general(logLik);

	random subj ~ normal(0,sigma) subject=Study_ID;
run;

This gives me reasonable parameter estimates and no convergence errors. I wanted to compare this model to the same model run without the offset term (logTI). However, as soon as I remove the offset term, I can no longer get the model to converge. If I run the above code AS IS (removing only the "+ logTI") I get the following error:

 

ERROR: Quadrature accuracy of 0.000100 could not be achieved with 31 points. The achieved
accuracy was 1.000000.

 

I figured the error had to do with the starting values for the parameters. So I removed the random intercept term (subj) and ran the model without offset using only fixed-effect terms; this converges, but gives me large gradient values, as well as a warning about the final Hessian matrix not being positive definite. I have tried several dozen different combinations of starting values, but it always gives me the same end result. I have also tried changing the minimization technique, the update method, the linesearch procedure, and the linesearch precision, but all to no avail (regardless of what I specify, it always seems to converge to some sort of local extremum with unreasonable values for the parameters and large gradients). This only seems to happen when the offset term isn't in the model; with the offset term things seem to work better (though, some of the gradients are still relatively big). 

 

My sample size is ~400, and each of the covariates is binary, except for the continuous "age". Examining the data itself, I don't see any problems that would throw off the model (and the data is sorted by the subject variable). Does anybody have any ideas for anything I can try, or some other options I can run that might help?

 

EDIT: If I simplify the model down a bit, and run the following, it converges just fine. As soon as I start adding other covariates in, however, I start getting issues with the gradient or the Hessian. I can't figure out exactly why this is going on. My sample size isn't huge, but it isn't tiny and should be able to support more than three binary indicator variables ...

 

PROC NLMIXED data=_model method=gauss;
 parms a0=-0.5732
b0=0.9552 b1=0.1298 b2=0.2687 b3=-0.6070
sigma=2.1479
; zeromodel = a0; phi = 1/(1+exp(-zeromodel)); mu = exp(b0 + subj + b1*Time + b2*Intervention + b3*Interaction); if UAVI=0 then logLik = log(phi + (1-phi)*exp(-mu)); else logLik = log(1-phi) + UAVI*log(mu) - mu - lgamma(UAVI+1); model UAVI ~ general(logLik); random subj ~ normal(0,sigma) subject=Study_ID; run;
1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

This is just a wild eyed guess, but the interesting thing to me is that the model converges WITH the offset.  Removing it leads to integration problems, so I can see why you went to different starting values.  I assume you grid-searched for reasonable numbers, and it still blew up.  Have you tried any alternative integration methods?  With only a single random effect, Hardy quadrature may be more effective than the default Gauss-Hermite.

 

Steve Denham

View solution in original post

6 REPLIES 6
PGStats
Opal | Level 21

I would suspect that some of your covariates are colinear, or nearly so. Try getting a correlation matrix of your covariates with proc corr.

PG
RyanSimmons
Pyrite | Level 9

My covariates are all binary variables, with the exception of age, which is continuous. So correlations aren't particularly meaningful.

Regardless, I have not been able to find any strong relationships between any of these covariates, no matter how I try to break them down graphically or numerically. 

SteveDenham
Jade | Level 19

This is just a wild eyed guess, but the interesting thing to me is that the model converges WITH the offset.  Removing it leads to integration problems, so I can see why you went to different starting values.  I assume you grid-searched for reasonable numbers, and it still blew up.  Have you tried any alternative integration methods?  With only a single random effect, Hardy quadrature may be more effective than the default Gauss-Hermite.

 

Steve Denham

RyanSimmons
Pyrite | Level 9

Hardy quadrature appears to converge better than the default, but it's not perfect (i.e. some rather large gradients). However, at least it is able to give me sensible estimates when the offset term isn't in the model.

 

Interestingly, I've noticed rather consistently that the variable for age seems to always have the largest gradient value, and that some of the models without age converge better. I'm not entirely sure why; the range for age in my dataset is not very large (~21-30 years) and doesn't appear to have any meaningful relationship with the outcome, but it's possible that something about the mixture of categorical and continuous covariates is giving the standard integration method some problems. 

 

It is especially interesting that it is only the models without the offset that have these problems (which is too bad, because for theoretical reasons the non-offset model is more appropriate for our analysis, given how we have defined the risk sets). I am guessing the problem lies somewhere in the difference in the outcome's distribution when it is offset (in which case it is essentially a proportion) versus as a straight-up count (which is wildly overdispersed, with values ranging from 0 to over 200, and that's AFTER screening out some of the worst outliers). Some combination of the covariate values must result in a small cell sizes with wildly disparate ranges that force the estimation procedure into local optima, though I have not been able to find this combination through standard exploration of the data.

 

In any case, the Hardy method appears adequate (and a couple of covariates will end up being dropped for non-statistical reasons, simplifying the model). 

SteveDenham
Jade | Level 19

A lot of the time I don't worry about large gradients when doing exploratory fits.  You mention age as being associated with a large gradient.  Are you using the actual age, or is it rescaled so that it is on relatively the same scale as other variables?  If it is "out of scale", that might give rise to the large gradient, especially if there is some discontinuity in response due to age.

 

As far as the overdispersion thing, any chance that ZI negative binomial might be a possible distribution?

 

Steve Denham

 

 

RyanSimmons
Pyrite | Level 9

Well, age will never be one the same scall as the other variables because it is continuous and all the others are binary indicator variables. Still, using a standardized version of age improves the situation, but not by much. 

 

And yes, a zero-inflated negative binomial (or, in fact, a standard negative binomial) may end up being solutions. When I had done initial model fit diagnostics on the baseline data from this study I found the ZIP to have the best performance, but now that the repeated measures are involved the situation appears to have changed. 

 

In any case, I have also switched over to PROC MCMC, in the hopes that looking at chain convergence will help show me where the problems in the model are. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2898 views
  • 2 likes
  • 3 in conversation