Interpreting fit statistics in proc glimmix with a binary outcome and ...

ha33 · Posted 11-15-2023 05:09 AM

Hi everyone, I am a non-statistician looking for some advice on how to intepret the fit statistics in proc glimmix.

I am computing odds ratios for an event (0/1) over time in the same individuals.

Data looks something like this:

ID	period	outcome	season	prev_treated
1	1	0	3	0
1	2	0	4	0
1	3	1	1	0
1	4	1	2	0
2	1	0	2	1
2	2	1	3	1

It might be worth adding that the share of observations in which outcome=1 is small, approx 10-20%.

The current model looks like this:

proc glimmix data = data1 method=rspl plots=oddsratio;
class ID period(ref="1") season prev_treated;
model outcome(event="1")= period season prev_treated / dist=binary link=logit oddsratio s;
random intercept / subject=id;
random period/subject=id residual type=AR(1);
run;

I have two questions :

1) Is there any way, based on this information, to determine which method should be used (RSPL, RMPL, MSPL, MMPL)?

2) In relation to 1), how do I interpret the Fit statistics table:

-2 res log pseudolikelihood
Generalized Chi-Square
Gener. Chi-Square/DF

Meaning can it be used like AIC, where lower is better, for example when specifyiung different methods in the method= statement or covariance structures in the type= statement. (AR(1), ARMA (1,1) and TOEP are of interest).

Also feel free to comment on the model, if you have other suggestions.

Thanks

sbxkoenk · Posted 11-15-2023 06:03 AM

Hello,

-2 Res Log Pseudo-Likelihood : The likelihood is preceded by the word “Pseudo” to indicate that it is computed from a pseudo-likelihood, rather than the true likelihood.
Gener. Chi-Square / DF : The ratio of the generalized chi-square statistic and its degrees of freedom should be close to 1. This would indicate that the variability in your data has been properly modeled, and that there is no residual overdispersion.
Generalized Chi-Square : The generalized chi-square statistic is a quadratic form in the marginal residuals that takes correlations among the data into account.

BR, Koen

StatsMan · Posted 11-15-2023 09:28 AM

The covariance structure may be too complicated for binary data. If you are having trouble with convergence, then drop the R-side fit and see if that helps. Also switching the optimization to NRRIDG can help as well. Using LAPLACE or QUADRATURE gives you models you can compare, using the fit statistics.

SteveDenham · Posted 11-17-2023 09:34 AM

Just a quick comment on covariance structure selection. If you use any of the pseudo-likelihood methods, the information criteria probably should not be used for selection, as the pseudo-likelihood estimates aren't the same under various structures. Thus @StatsMan 's comments re LAPLACE or QUADRATURE. If you truly want to use pseudo-likelihood methods, then probably the best you can do for covariance structure selection is look at the Gener. Chi-Square / DF value, and pick the structure that has the least over- or under-dispersion. You should note that this measure will get closer to 1 the more variables are estimated, and there is no penalization for this as there is for the information criteria, so "Caveat emptor" - let the user (buyer) beware.

SteveDenham

RosieSAS · Posted 11-27-2023 02:44 PM

I read a paper stated that the criteria "Gener. Chi-Square / DF =1 " does not work for generalized linear mixed model when the model has random effects.

jiltao · Posted 11-15-2023 09:50 AM

Unfortunately there are no good ways that I am aware of to do what you asked for with your model.

Thanks,

Jill

SteveDenham · Posted 11-30-2023 09:55 AM

Would a two step method be a possibility? Step 1: Use LAPLACE or QUAD (if you have enough data) to fit the RANDOM effects, and output the variance/covariance parameter estimates to a dataset. This would enable selection of an error structure with the smallest corrected AIC. Step 2.Fit your current model using the pseudolikelihood method and a residual R side effect for the repeated factor. You could use the values obtained in the first step as starting values in a PARMS statement.

NOTE WELL: THIS IS UNTESTED AND THERE IS NO GUARANTEE THAT IT WILL SOLVE THE PROBLEM

Additionally, you should consider that since this is a GLMM with a binary distribution the best approach may be to do this all as a G side analysis.

SteveDenham

Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Re: Interpreting fit statistics in proc glimmix with a binary outcome and G+R side random effects

Ready to join fellow brilliant minds for the SAS Hackathon?