BookmarkSubscribeRSS Feed
ha33
Obsidian | Level 7

Hi everyone, I am a non-statistician looking for some advice on how to intepret the fit statistics in proc glimmix.

 

I am computing odds ratios for an event (0/1) over time in the same individuals.

 

Data looks something like this:  

 

ID period outcome season prev_treated
1 1 0 3 0
1 2 0 4 0
1 3 1 1 0
1 4 1 2 0
2 1 0 2 1
2 2 1 3 1

 

It might be worth adding that the share of observations in which outcome=1 is small, approx 10-20%.

 

The current model looks like this:

 

proc glimmix data = data1 method=rspl plots=oddsratio;
class ID period(ref="1") season prev_treated;
model outcome(event="1")= period season prev_treated / dist=binary link=logit oddsratio s;
random intercept / subject=id;
random period/subject=id residual type=AR(1);
run;

I have two questions : 

 

1) Is there any way, based on this information, to determine which method should be used (RSPL, RMPL, MSPL, MMPL)?

 

2) In relation to 1), how do I interpret the Fit statistics table:

  • -2 res log pseudolikelihood
  • Generalized Chi-Square
  • Gener. Chi-Square/DF

Meaning can it be used like AIC, where lower is better, for example when specifyiung different methods in the method= statement or  covariance structures in the type= statement. (AR(1), ARMA (1,1) and TOEP are of interest). 

 

Also feel free to comment on the model, if you have other suggestions.  

 

Thanks

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello,

 

  • -2 Res Log Pseudo-Likelihood : The likelihood is preceded by the word “Pseudo” to indicate that it is computed from a pseudo-likelihood, rather than the true likelihood.
  • Gener. Chi-Square / DF : The ratio of the generalized chi-square statistic and its degrees of freedom should be close to 1. This would indicate that the variability in your data has been properly modeled, and that there is no residual overdispersion.
  • Generalized Chi-Square : The generalized chi-square statistic is a quadratic form in the marginal residuals that takes correlations among the data into account.

BR, Koen

StatsMan
SAS Super FREQ

The covariance structure may be too complicated for binary data. If you are having trouble with convergence, then drop the R-side fit and see if that helps. Also switching the optimization to NRRIDG can help as well. Using LAPLACE or QUADRATURE gives you models you can compare, using the fit statistics. 

SteveDenham
Jade | Level 19

Just a quick comment on covariance structure selection. If you use any of the pseudo-likelihood methods, the information criteria probably should not be used for selection, as the pseudo-likelihood estimates aren't the same under various structures. Thus @StatsMan 's comments re LAPLACE or QUADRATURE. If you truly want to use pseudo-likelihood methods, then probably the best you can do for covariance structure selection is look at the Gener. Chi-Square / DF value, and pick the structure that has the least over- or under-dispersion. You should note that this measure will get closer to 1 the more variables are estimated, and there is no penalization for this as there is for the information criteria, so "Caveat emptor" - let the user (buyer) beware.

 

SteveDenham

RosieSAS
Obsidian | Level 7

I read a paper stated that the criteria "Gener. Chi-Square / DF =1 " does not work for generalized linear mixed model when the model has random effects.  

jiltao
SAS Super FREQ

Unfortunately there are no good ways that I am aware of to do what you asked for with your model.

Thanks,

Jill

SteveDenham
Jade | Level 19

Would a two step method be a possibility? Step 1: Use LAPLACE or QUAD (if you have enough data) to fit the RANDOM effects, and output the variance/covariance parameter estimates to a dataset. This would enable selection of an error structure with the smallest corrected AIC. Step 2.Fit your current model using the pseudolikelihood method and a residual R side effect for the repeated factor. You could use the values obtained in the first step as starting values in a PARMS statement.

 

NOTE WELL: THIS IS UNTESTED AND THERE IS NO GUARANTEE THAT IT WILL SOLVE THE PROBLEM

 

 

Additionally, you should consider that since this is a GLMM with a binary distribution the best approach may be to do this all as a G side analysis.

 

SteveDenham

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1056 views
  • 0 likes
  • 6 in conversation