Hi everyone, I am a non-statistician looking for some advice on how to intepret the fit statistics in proc glimmix.
I am computing odds ratios for an event (0/1) over time in the same individuals.
Data looks something like this:
ID | period | outcome | season | prev_treated |
1 | 1 | 0 | 3 | 0 |
1 | 2 | 0 | 4 | 0 |
1 | 3 | 1 | 1 | 0 |
1 | 4 | 1 | 2 | 0 |
2 | 1 | 0 | 2 | 1 |
2 | 2 | 1 | 3 | 1 |
It might be worth adding that the share of observations in which outcome=1 is small, approx 10-20%.
The current model looks like this:
proc glimmix data = data1 method=rspl plots=oddsratio;
class ID period(ref="1") season prev_treated;
model outcome(event="1")= period season prev_treated / dist=binary link=logit oddsratio s;
random intercept / subject=id;
random period/subject=id residual type=AR(1);
run;
I have two questions :
1) Is there any way, based on this information, to determine which method should be used (RSPL, RMPL, MSPL, MMPL)?
2) In relation to 1), how do I interpret the Fit statistics table:
Meaning can it be used like AIC, where lower is better, for example when specifyiung different methods in the method= statement or covariance structures in the type= statement. (AR(1), ARMA (1,1) and TOEP are of interest).
Also feel free to comment on the model, if you have other suggestions.
Thanks
Hello,
BR, Koen
The covariance structure may be too complicated for binary data. If you are having trouble with convergence, then drop the R-side fit and see if that helps. Also switching the optimization to NRRIDG can help as well. Using LAPLACE or QUADRATURE gives you models you can compare, using the fit statistics.
Just a quick comment on covariance structure selection. If you use any of the pseudo-likelihood methods, the information criteria probably should not be used for selection, as the pseudo-likelihood estimates aren't the same under various structures. Thus @StatsMan 's comments re LAPLACE or QUADRATURE. If you truly want to use pseudo-likelihood methods, then probably the best you can do for covariance structure selection is look at the Gener. Chi-Square / DF value, and pick the structure that has the least over- or under-dispersion. You should note that this measure will get closer to 1 the more variables are estimated, and there is no penalization for this as there is for the information criteria, so "Caveat emptor" - let the user (buyer) beware.
SteveDenham
I read a paper stated that the criteria "Gener. Chi-Square / DF =1 " does not work for generalized linear mixed model when the model has random effects.
Unfortunately there are no good ways that I am aware of to do what you asked for with your model.
Thanks,
Jill
Would a two step method be a possibility? Step 1: Use LAPLACE or QUAD (if you have enough data) to fit the RANDOM effects, and output the variance/covariance parameter estimates to a dataset. This would enable selection of an error structure with the smallest corrected AIC. Step 2.Fit your current model using the pseudolikelihood method and a residual R side effect for the repeated factor. You could use the values obtained in the first step as starting values in a PARMS statement.
NOTE WELL: THIS IS UNTESTED AND THERE IS NO GUARANTEE THAT IT WILL SOLVE THE PROBLEM
Additionally, you should consider that since this is a GLMM with a binary distribution the best approach may be to do this all as a G side analysis.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.