Solved: simulation study with glimmix bad parameter coverage rates

Julia_Mang · Posted 05-27-2019 09:00 AM

Dear SAS community,

I am running a simulation study with different hierarchical models and different weighting scenarios with proc glimmix. I use e.g. for the nullmodell the following code:

proc glimmix data=data method=quadrature empirical=classical;
class cluster;
model ach = / dist=normal solution;
random int / subject=cluster weight=schwt solution;
by rep;
ods output ParameterEstimates=param1 CovParms=param2;
run;

Data have been simulated in R having for this scenario variable ach ~N(500,100) and ICC=0.1 in each of 1000 data sets.

When I look at the estimated outcomes for each single "by" replication, I recover a very bad coverage rate, i.e. the estimated parameters are badly included in the confidence intervall with given population values from the simulation (e.g. for the intercept of this nullmodell the coverage rate is 71%; this appears by the way for all different weighting scenarios).

I use the same simulated data and procedures also with the programm MPLUS and very high coverage rates (over 95%) appear. From my research, MPLUS is using the same estimation technique as I stated in the SAS syntax with method=quadrature and empirical=classical.

I assume that I am specifing still something wrong, otherwise I could not explain those bad coverage rates. Can you imaging what can couse this problem? Or should I use PROC MIXED instead of PROC GLIMMIX, but I though the latter one is preferred nowadays?

Thank you in advance for your help and kind regards,

Julia Mang

Rick_SAS · Posted 08-13-2019 10:20 AM

I don't know anything about MPLUS, so I can't help you there. It sounds like this is a research project, so you might want to discuss it with an advisor or colleague and share the code with him/her.

Regarding the initial guesses for estimates, there are many ways to guess the starting values. I've written a few SAS-related posts that might interest you. They do not necessarily use PROC GLMMIX, but the ideas are generally applicable to regression procedures that estimate parameters by solving an optimization problem:

Lastly, I'll mention that comparing results across software packages requires careful reading of the docs and sometimes some detective work. Each software has different defaults, different optimization methods, and different objective functions (eg, reduced restricted maximum likelihood, pseudo-likelihood, or integral approximation by adaptive quadrature or Laplace methods.) It is sometimes not possible to use one software package to exactly duplicate the results of another. There are several good papers and books that you might want to read:

http://www-personal.umich.edu/~bwest/mccoach_etal_2018_mlmcompare.pdf
https://stat.utexas.edu/images/SSC/Site/hlm_comparison-1.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3630376/
An excellent book that compares mixed models in software packages is West, B. T., Welch, K. B., & Galecki, A. T. (2015). Linear mixed models: A practical guide using statistical software (2nd ed.). Boca Raton, FL: CRC Press. However, it does not include MPLUS. It compares SAS, SPSS, Stata, R/S-plus, and HLM.

Good luck!

View solution in original post

Rick_SAS · Posted 05-28-2019 10:09 AM

When this happens it usually means either

1. The samples are small and the asymptotic coverage probabilities do not apply

2. The simulated data were not generated by the same model that is being tested.

If (2), go back and review how you are simulating the data. Make sure that the random intercept is only generated once per cluster.

I am not familiar with the weighting scheme you are using, but the GLMMIX doc does say "PROC GLIMMIX uses the weights provided in the data set directly. To use the scaled weights, you need to provide them in the data set." Check that MPLUS and GLMMIX are treating the weights in the same way.

Julia_Mang · Posted 08-13-2019 09:14 AM

Dear Rick_SAS,

thank you for your answer and sorry for my delayed reply.

I checked all your suggestions and here is what i can give as feedback:

1) Each sample includes 5000 cases, which should be large enough

2) the random intercept in the data simulation is only generated once per cluster. I simulate the data with R before I run the analyses in MPlus and SAS

Weights are simulated and accordingly scaled also beforehand in R and used in MPlus and SAS the same way.

My next consideration, in order to isolate the problem, would be to read the estimate of the analyses from Mplus as a starting value into SAS and run the MC simulation again. Unfortunately, I only find the possibility for the procedure PROC glimmix to define starting values for the variances via Parms. Or do I overlook the possibility of also reading in start values for the parameter values here?

I would be very grateful for feedback!

Kind regards,

Julia

Rick_SAS · Posted 08-13-2019 10:20 AM

I don't know anything about MPLUS, so I can't help you there. It sounds like this is a research project, so you might want to discuss it with an advisor or colleague and share the code with him/her.

Regarding the initial guesses for estimates, there are many ways to guess the starting values. I've written a few SAS-related posts that might interest you. They do not necessarily use PROC GLMMIX, but the ideas are generally applicable to regression procedures that estimate parameters by solving an optimization problem:

Lastly, I'll mention that comparing results across software packages requires careful reading of the docs and sometimes some detective work. Each software has different defaults, different optimization methods, and different objective functions (eg, reduced restricted maximum likelihood, pseudo-likelihood, or integral approximation by adaptive quadrature or Laplace methods.) It is sometimes not possible to use one software package to exactly duplicate the results of another. There are several good papers and books that you might want to read:

http://www-personal.umich.edu/~bwest/mccoach_etal_2018_mlmcompare.pdf
https://stat.utexas.edu/images/SSC/Site/hlm_comparison-1.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3630376/
An excellent book that compares mixed models in software packages is West, B. T., Welch, K. B., & Galecki, A. T. (2015). Linear mixed models: A practical guide using statistical software (2nd ed.). Boca Raton, FL: CRC Press. However, it does not include MPLUS. It compares SAS, SPSS, Stata, R/S-plus, and HLM.

Good luck!

Julia_Mang · Posted 11-05-2019 07:56 AM

I would like to give you some feedback on my question because you took some time to answer it.

In the meantime I was able to find out that MPlus doesn't give empirical coverage rates as a result, but those which are based on a 5% error level for the assumption of a normal distribution. If I look at the empirical coverage rates in MPlus, I get quite exactly the same rates as in SAS.

Nevertheless, it has cost a lot of detective work to look behind the program syntax, which I personally find very unfortunate.

Many thanks again for your detailed help!

Best regards

Julia Mang

simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

Re: simulation study with glimmix bad parameter coverage rates

SAS Innovate 2025: Call for Content