BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bnd
Fluorite | Level 6 bnd
Fluorite | Level 6

Hello,

Is there a way to calculate the r-squared or pseudo-r-squared for proc mixed in SAS (models with fixed and random effects)? Or would it have to be hand calculated? 

I saw one post that stated to run the null model and then the full-model and to look at the variance components.

 

Any other thoughts, suggestions, or clarifications as to how to best calculate the r-squared when using proc mixed?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.

--
Paige Miller
bnd
Fluorite | Level 6 bnd
Fluorite | Level 6

Hi @PaigeMiller 

 

Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.

SteveDenham
Jade | Level 19

Would you care to share that method with the rest of us?

 

Thanks in advance.

 

SteveDenham

PaigeMiller
Diamond | Level 26

@bnd wrote:

Hi @PaigeMiller 

 

Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.


Yes, I agree with @SteveDenham, you need to explain what method you chose, and why, so we can all learn.

--
Paige Miller
bnd
Fluorite | Level 6 bnd
Fluorite | Level 6

Hi @SteveDenham and @PaigeMiller,

 

I chose this formula, R-squared = 1 - SSE_Model / SSE_IntOnly. SSE represents the sum of squared residuals from the model and SSE_IntOnly represents the sum of squared residuals from the intercept-only model. I chose this model because I was looking for a simple and less complicated formula to calculate the percent reduction in variance from the null model to the full model. I used the covariance parameter estimates table from proc mixed to calculate the R-squared. http://math.usu.edu/jrstevens/stat5200/25.Rsquare_Design.pdf

 

I am not sure if I have explained this well! I am very new to calculating the R-squared for multilevel models. I am not sure if this approach is the best or if R-squared should even be calculated this way, but it was a simple formula for me. 

 

I also found this formula, R-squared = SSR/CTSS, where the SSR is the reduction sums of squares due to the model over and above the mean and the CTSS is the corrected total sum of squares. I got the same percent reductions using this formula. http://animsci.agrenv.mcgill.ca/StatisticalMethodsII/drvpseudor.pdf

SteveDenham
Jade | Level 19

While this works, remind yourself over and over that the sums of squares in a mixed model are NOT what is optimized.  It is a maximum likelihood method, and only in the fully balanced design with uncorrelated errors would the sums of squares be the same.  A good substitute might be to look at the AIC values and determine the amount of information retained from the null model in the fit model.  You could even put this on a relative basis. See the Wikipedia article on Akaike Information Criterion https://en.wikipedia.org/wiki/Akaike_information_criterion , which is a very good summary and points out how to compare models and the caveats involved.

 

SteveDenham

bnd
Fluorite | Level 6 bnd
Fluorite | Level 6

Hi @SteveDenham,

 

I actually looked at the AIC as well! Maybe I should just focus on the AIC instead of the pseudo-R-squared because as you have stated the sum of squares is not what is being optimized in mixed models.  

 

Thanks!

Brittney

zjppdozen
Fluorite | Level 6
Hi Steve,

Do you think likelihood ratio r-squared will be a better pseudo-r2 in mixed model, as described here https://www.ars.usda.gov/ARSUserFiles/80000000/SpatialWorkshop/19kramersupplrsq.pdf?

Also, the formula for likelihood ratio r-squared is Rlr = 1-exp(-2/n(LLM-LL0)). In the longitudinal data, do you know the "n" here should be the total person-year observations or just the total subjects included in the data set?
SteveDenham
Jade | Level 19

The Kramer paper looks quite good, and I can see some utility in the MLE based pseudo-R2.  However, you would have to be sure to change to an ML method from the standard REML methods used in MIXED and GLIMMIX, and that leads to biased estimates (as a simple example, compare the biased estimate of the variance (denominator=n) to the unbiased estimate (denominator = n-1), the proof that the biased estimate is the ML estimate is a pretty standard math stats course proof).  I think we are still looking for an appropriate approach to goodness of fit for REML mixed models.

 

BTW, the n in the formula is the total number of observations.

 

SteveDenham

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 7837 views
  • 3 likes
  • 4 in conversation