Programming the statistical procedures from SAS

NLMIXED optimization problems

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 98
Accepted Solution

NLMIXED optimization problems

[ Edited ]

I am running a mixed-effects zero-inflated Poisson model using PROC NLMIXED. The following code fits my model without any problems:

 

 

PROC NLMIXED data=_model method=gauss;
	parms a0=-1.1433
		  b0=-0.8371 b1=0.07413 b2=0.08834 b3=-0.2387 b4=0 b5=0 b6=0 b7=0 b8=0
		  sigma=0.3030
	;

	zeromodel = a0;
	phi = 1/(1+exp(-zeromodel));

	mu = exp(b0 + subj + b1*Time + b2*Intervention + b3*Interaction 					
		    + b4*phone + b5*positive + b6*education + b7*income + b8*age	
	            + logTI);														
	if UAVI=0 then
		logLik = log(phi + (1-phi)*exp(-mu));
	else
		logLik = log(1-phi) + UAVI*log(mu) - mu - lgamma(UAVI+1);
	model UAVI ~ general(logLik);

	random subj ~ normal(0,sigma) subject=Study_ID;
run;

This gives me reasonable parameter estimates and no convergence errors. I wanted to compare this model to the same model run without the offset term (logTI). However, as soon as I remove the offset term, I can no longer get the model to converge. If I run the above code AS IS (removing only the "+ logTI") I get the following error:

 

ERROR: Quadrature accuracy of 0.000100 could not be achieved with 31 points. The achieved
accuracy was 1.000000.

 

I figured the error had to do with the starting values for the parameters. So I removed the random intercept term (subj) and ran the model without offset using only fixed-effect terms; this converges, but gives me large gradient values, as well as a warning about the final Hessian matrix not being positive definite. I have tried several dozen different combinations of starting values, but it always gives me the same end result. I have also tried changing the minimization technique, the update method, the linesearch procedure, and the linesearch precision, but all to no avail (regardless of what I specify, it always seems to converge to some sort of local extremum with unreasonable values for the parameters and large gradients). This only seems to happen when the offset term isn't in the model; with the offset term things seem to work better (though, some of the gradients are still relatively big). 

 

My sample size is ~400, and each of the covariates is binary, except for the continuous "age". Examining the data itself, I don't see any problems that would throw off the model (and the data is sorted by the subject variable). Does anybody have any ideas for anything I can try, or some other options I can run that might help?

 

EDIT: If I simplify the model down a bit, and run the following, it converges just fine. As soon as I start adding other covariates in, however, I start getting issues with the gradient or the Hessian. I can't figure out exactly why this is going on. My sample size isn't huge, but it isn't tiny and should be able to support more than three binary indicator variables ...

 

PROC NLMIXED data=_model method=gauss;
 parms a0=-0.5732
b0=0.9552 b1=0.1298 b2=0.2687 b3=-0.6070
sigma=2.1479
; zeromodel = a0; phi = 1/(1+exp(-zeromodel)); mu = exp(b0 + subj + b1*Time + b2*Intervention + b3*Interaction); if UAVI=0 then logLik = log(phi + (1-phi)*exp(-mu)); else logLik = log(1-phi) + UAVI*log(mu) - mu - lgamma(UAVI+1); model UAVI ~ general(logLik); random subj ~ normal(0,sigma) subject=Study_ID; run;

Accepted Solutions
Solution
‎05-02-2016 10:41 AM
Respected Advisor
Posts: 2,655

Re: NLMIXED optimization problems

This is just a wild eyed guess, but the interesting thing to me is that the model converges WITH the offset.  Removing it leads to integration problems, so I can see why you went to different starting values.  I assume you grid-searched for reasonable numbers, and it still blew up.  Have you tried any alternative integration methods?  With only a single random effect, Hardy quadrature may be more effective than the default Gauss-Hermite.

 

Steve Denham

View solution in original post


All Replies
Respected Advisor
Posts: 4,756

Re: NLMIXED optimization problems

I would suspect that some of your covariates are colinear, or nearly so. Try getting a correlation matrix of your covariates with proc corr.

PG
Frequent Contributor
Posts: 98

Re: NLMIXED optimization problems

My covariates are all binary variables, with the exception of age, which is continuous. So correlations aren't particularly meaningful.

Regardless, I have not been able to find any strong relationships between any of these covariates, no matter how I try to break them down graphically or numerically. 

Solution
‎05-02-2016 10:41 AM
Respected Advisor
Posts: 2,655

Re: NLMIXED optimization problems

This is just a wild eyed guess, but the interesting thing to me is that the model converges WITH the offset.  Removing it leads to integration problems, so I can see why you went to different starting values.  I assume you grid-searched for reasonable numbers, and it still blew up.  Have you tried any alternative integration methods?  With only a single random effect, Hardy quadrature may be more effective than the default Gauss-Hermite.

 

Steve Denham

Frequent Contributor
Posts: 98

Re: NLMIXED optimization problems

Hardy quadrature appears to converge better than the default, but it's not perfect (i.e. some rather large gradients). However, at least it is able to give me sensible estimates when the offset term isn't in the model.

 

Interestingly, I've noticed rather consistently that the variable for age seems to always have the largest gradient value, and that some of the models without age converge better. I'm not entirely sure why; the range for age in my dataset is not very large (~21-30 years) and doesn't appear to have any meaningful relationship with the outcome, but it's possible that something about the mixture of categorical and continuous covariates is giving the standard integration method some problems. 

 

It is especially interesting that it is only the models without the offset that have these problems (which is too bad, because for theoretical reasons the non-offset model is more appropriate for our analysis, given how we have defined the risk sets). I am guessing the problem lies somewhere in the difference in the outcome's distribution when it is offset (in which case it is essentially a proportion) versus as a straight-up count (which is wildly overdispersed, with values ranging from 0 to over 200, and that's AFTER screening out some of the worst outliers). Some combination of the covariate values must result in a small cell sizes with wildly disparate ranges that force the estimation procedure into local optima, though I have not been able to find this combination through standard exploration of the data.

 

In any case, the Hardy method appears adequate (and a couple of covariates will end up being dropped for non-statistical reasons, simplifying the model). 

Respected Advisor
Posts: 2,655

Re: NLMIXED optimization problems

A lot of the time I don't worry about large gradients when doing exploratory fits.  You mention age as being associated with a large gradient.  Are you using the actual age, or is it rescaled so that it is on relatively the same scale as other variables?  If it is "out of scale", that might give rise to the large gradient, especially if there is some discontinuity in response due to age.

 

As far as the overdispersion thing, any chance that ZI negative binomial might be a possible distribution?

 

Steve Denham

 

 

Frequent Contributor
Posts: 98

Re: NLMIXED optimization problems

Well, age will never be one the same scall as the other variables because it is continuous and all the others are binary indicator variables. Still, using a standardized version of age improves the situation, but not by much. 

 

And yes, a zero-inflated negative binomial (or, in fact, a standard negative binomial) may end up being solutions. When I had done initial model fit diagnostics on the baseline data from this study I found the ZIP to have the best performance, but now that the repeated measures are involved the situation appears to have changed. 

 

In any case, I have also switched over to PROC MCMC, in the hopes that looking at chain convergence will help show me where the problems in the model are. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 393 views
  • 2 likes
  • 3 in conversation