BookmarkSubscribeRSS Feed
stats_x
Fluorite | Level 6

Hello, SAS experts!

I am running a GEE model on the same count data. It fails to converge in Proc Genmod but converged in Glimmix. I want to understand why. Here below are my code examples:

 

proc genmod data=ds;
class region center;
model count=region center(region) covar1 covar2/ dist=nb;
run;

proc glimmix data=ds empirical=mbn method=laplace;
class region center;
model count=region center(region) covar1 covar2/ dist=nb;
NLOPTIONS tech=nrridg gconv = 0;
run;

 

Ideally, region and center should be treated as fixed effects. But I want to get LSM estimate out of those two factors, so I put them as fixed effects in the model.

In addition, I have a sparse input matrix, i.e. , at some centers, I have zero counts. It seems to be the main cause for non-converge issue. The genmod outputs warnings such as below when failed to converge:

 

WARNING: The relative Hessian convergence criterion of 19.752778094 is greater than the limit of 0.0001. The convergence is  questionable.
WARNING: The procedure is continuing but the validity of the model fit is questionable.

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable

 

I read relevant papers e.g. Paper SAS2179-2018  from K. Kiernan. It seems that the methods of estimation and optimization would be similar in genmod and glimmix when no random affect is specified. In particular, I used  "method=laplace" and "tech=nrridg" options for glimmix.

 

According to SAS online docs:

"The essential difference between the estimation approaches taken by the GLIMMIX procedure and generalized estimating equations is that the latter approach estimates the covariance parameters by the method of moments, whereas the GLIMMIX procedure uses likelihood-based techniques".

 

My questions are as follows:

1. Why my models get converged in glimmix but not in genmod? I tested this in a couple of data set. And I compare results on the same data.

2. Does the estimation approach of covariance parameter affect the convergence result differently between those two procedures? 

3. Which procedure would be more appropriate for my model?

 

Could anyone shed some lights on this? Thanks a lot! 

6 REPLIES 6
StatDave
SAS Super FREQ

Neither of those is a GEE model. In GENMOD, you would need to include a REPEATED statement with the SUBJECT= option to identify clusters of correlated observations. In GLIMMIX, you would need to include a RANDOM _RESIDUAL_ statement, also with a SUBJECT= option. As you note, results in that case could certainly differ because the estimating methods are definitely different. The code you show simply fit a negative binomial model assuming that the observations are independent. So, if there are correlated observations, then neither is appropriate. But purely on the question of the convergence difference in that code, there are again differences in the details of the fitting algorithms that could cause differing results including convergence. If the negative binomial dispersion parameter is very nearly zero, then adding the NOLOGNB option in the MODEL statement in GENMOD might allow convergence. But, of course, in that case then maybe a simple Poisson model is more appropriate.

stats_x
Fluorite | Level 6
Hi Dave, I appreciate your prompt response! I should have called it a marginal(population -averaged )model instead of GEE, is that correct? I do agree the subjects within a center/cluster should be assumed to have correlations and using RANDOM statement to specify the cluster. However, in order to get the LSMean difference among centers, center and region have to be included as fixed effects in the MODEL statement. If I add RANDOM _residual_/Subject= center(region) in glimmix, the model fails to converge again. I am not sure why the specification of correlation would lead model failed to converge. Could you elaborate where the difference resides in the fitting algorithms a little bit? With "method=laplace", I thought it fits maxim likelihood with optimization technique of Netwon-Raphson with Ridging, matching that for PROC Genmod.
StatDave
SAS Super FREQ

If you want to fit a population averaged model, then a GEE model is appropriate. You can do that in PROC GEE, or PROC GENMOD by including the REPEATED statement. This uses the GEE algorithm as described in the Details section of the GENMOD documentation. Note that this is not a maximum likelihood based method. If observations within a CENTER are correlated, and if all of the values of the CENTER variable are unique (regardless of REGION), then specify SUBJECT=CENTER in the REPEATED statement. You only need to specify CENTER(REGION), if the same values of CENTER are used in multiple REGIONs as described in this note. The analysis from GLIMMIX will never be identical to that from GENMOD due to differences in the fitting method. Note that the idea behind a "fixed effects" model is to avoid estimating a large number of parameters for the (usually) large number of clusters in the data. Methods like GEE, stratification, or conditional methods allow you to avoid fitting parameters for all of the clusters. But if your goal is to estimate the effects of the clusters, then that requires estimating those parameters. So, if your goal is to estimate means for each CENTER, you can try including CENTER in the model and in the LSMEANS statement in a GEE model such as (assuming all CENTER values are unique)

proc gee data=ds;
class center;
model count=center covar1 covar2/ dist=nb;
repeated subject=center; lsmeans center / ilink; run;

 

stats_x
Fluorite | Level 6
Thanks again for your reply! Your explanation really helps me understand much better on this. And I tried your suggested code in PROC GEE but no luck on the model converge with error "The generalized Hessian matrix is not positive definite. Iteration will be terminated." The fact that many centers have zero count could be the main cause. I might go with the l glimmix model I tried. That's the only one getting converged. On a side note, I examined the Iteration History of the glimmix model: at the last iteration, some data gets large Max Gradient (e.g. 55 or 0.8) with Convergence criterion (XCONV=0) satisfied. I am concerned that it only got converged in relative gradient. So I tried to tune the "ABSGCONV" and "ABSXCONV" for the absolute gradient to be small. But it didn't affect the results. So I am not sure if I could trust the converged result in glimmix. What's your thoughts on this? Is the fitting method of Glimmix model superior than Genmod in my case considering that the later always failed to converge?
StatDave
SAS Super FREQ

The problem is more likely that the model includes the cluster variable that is used in SUBJECT=. You still have not indicated if the centers have unique identifying values in the CENTER variable, or if you tried using the NOLOGNB option in GENMOD, but you could try using the following model in GENMOD (assuming unique CENTER values) which will fit a negative binomial model with scale restricted to zero (equivalent of a Poisson model) and provide a test of that restriction which will indicate if a negative binomial model is even needed.

model count=center covar1 covar2/ dist=nb scale=0 noscale;

If that restriction test is not significant indicating that the Poisson model is adequate, then you could try fitting the simpler Poisson model 

model count=center covar1 covar2/ dist=poisson;
repeated subject=center;

 

 

stats_x
Fluorite | Level 6
I really appreciate your reply! Yes, within the cluster Center, my subject IDs are unique. So I removed the nested term following your suggestions. But neither negative binomial nor poisson in Genmod converged using your last suggested models. It is interesting that the Glimmix gets converged even with the nested terms (code in my original post). And I've been trying to understand why and if I could trust the converged results. In addition, does that indicate Glimmix is a better approach for my data compared to Genmod?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1322 views
  • 6 likes
  • 2 in conversation