Hello,
Is there a way to calculate the r-squared or pseudo-r-squared for proc mixed in SAS (models with fixed and random effects)? Or would it have to be hand calculated?
I saw one post that stated to run the null model and then the full-model and to look at the variance components.
Any other thoughts, suggestions, or clarifications as to how to best calculate the r-squared when using proc mixed?
Thanks!
There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.
There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.
Hi @PaigeMiller
Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.
Would you care to share that method with the rest of us?
Thanks in advance.
SteveDenham
@bnd wrote:
Hi @PaigeMiller
Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.
Yes, I agree with @SteveDenham, you need to explain what method you chose, and why, so we can all learn.
Hi @SteveDenham and @PaigeMiller,
I chose this formula, R-squared = 1 - SSE_Model / SSE_IntOnly. SSE represents the sum of squared residuals from the model and SSE_IntOnly represents the sum of squared residuals from the intercept-only model. I chose this model because I was looking for a simple and less complicated formula to calculate the percent reduction in variance from the null model to the full model. I used the covariance parameter estimates table from proc mixed to calculate the R-squared. http://math.usu.edu/jrstevens/stat5200/25.Rsquare_Design.pdf
I am not sure if I have explained this well! I am very new to calculating the R-squared for multilevel models. I am not sure if this approach is the best or if R-squared should even be calculated this way, but it was a simple formula for me.
I also found this formula, R-squared = SSR/CTSS, where the SSR is the reduction sums of squares due to the model over and above the mean and the CTSS is the corrected total sum of squares. I got the same percent reductions using this formula. http://animsci.agrenv.mcgill.ca/StatisticalMethodsII/drvpseudor.pdf
While this works, remind yourself over and over that the sums of squares in a mixed model are NOT what is optimized. It is a maximum likelihood method, and only in the fully balanced design with uncorrelated errors would the sums of squares be the same. A good substitute might be to look at the AIC values and determine the amount of information retained from the null model in the fit model. You could even put this on a relative basis. See the Wikipedia article on Akaike Information Criterion https://en.wikipedia.org/wiki/Akaike_information_criterion , which is a very good summary and points out how to compare models and the caveats involved.
SteveDenham
Hi @SteveDenham,
I actually looked at the AIC as well! Maybe I should just focus on the AIC instead of the pseudo-R-squared because as you have stated the sum of squares is not what is being optimized in mixed models.
Thanks!
Brittney
The Kramer paper looks quite good, and I can see some utility in the MLE based pseudo-R2. However, you would have to be sure to change to an ML method from the standard REML methods used in MIXED and GLIMMIX, and that leads to biased estimates (as a simple example, compare the biased estimate of the variance (denominator=n) to the unbiased estimate (denominator = n-1), the proof that the biased estimate is the ML estimate is a pretty standard math stats course proof). I think we are still looking for an appropriate approach to goodness of fit for REML mixed models.
BTW, the n in the formula is the total number of observations.
SteveDenham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.