Hi everyone, I am a non-statistician looking for some advice on how to intepret the fit statistics in proc glimmix.
I am computing odds ratios for an event (0/1) over time in the same individuals.
Data looks something like this:
It might be worth adding that the share of observations in which outcome=1 is small, approx 10-20%.
The current model looks like this:
proc glimmix data = data1 method=rspl plots=oddsratio; class ID period(ref="1") season prev_treated; model outcome(event="1")= period season prev_treated / dist=binary link=logit oddsratio s; random intercept / subject=id; random period/subject=id residual type=AR(1); run;
I have two questions :
1) Is there any way, based on this information, to determine which method should be used (RSPL, RMPL, MSPL, MMPL)?
2) In relation to 1), how do I interpret the Fit statistics table:
Meaning can it be used like AIC, where lower is better, for example when specifyiung different methods in the method= statement or covariance structures in the type= statement. (AR(1), ARMA (1,1) and TOEP are of interest).
Also feel free to comment on the model, if you have other suggestions.
The covariance structure may be too complicated for binary data. If you are having trouble with convergence, then drop the R-side fit and see if that helps. Also switching the optimization to NRRIDG can help as well. Using LAPLACE or QUADRATURE gives you models you can compare, using the fit statistics.
Just a quick comment on covariance structure selection. If you use any of the pseudo-likelihood methods, the information criteria probably should not be used for selection, as the pseudo-likelihood estimates aren't the same under various structures. Thus @StatsMan 's comments re LAPLACE or QUADRATURE. If you truly want to use pseudo-likelihood methods, then probably the best you can do for covariance structure selection is look at the Gener. Chi-Square / DF value, and pick the structure that has the least over- or under-dispersion. You should note that this measure will get closer to 1 the more variables are estimated, and there is no penalization for this as there is for the information criteria, so "Caveat emptor" - let the user (buyer) beware.
Would a two step method be a possibility? Step 1: Use LAPLACE or QUAD (if you have enough data) to fit the RANDOM effects, and output the variance/covariance parameter estimates to a dataset. This would enable selection of an error structure with the smallest corrected AIC. Step 2.Fit your current model using the pseudolikelihood method and a residual R side effect for the repeated factor. You could use the values obtained in the first step as starting values in a PARMS statement.
NOTE WELL: THIS IS UNTESTED AND THERE IS NO GUARANTEE THAT IT WILL SOLVE THE PROBLEM
Additionally, you should consider that since this is a GLMM with a binary distribution the best approach may be to do this all as a G side analysis.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.