Statistical Procedures

bnd · Posted 04-03-2020 05:39 PM

Hello,

Is there a way to calculate the r-squared or pseudo-r-squared for proc mixed in SAS (models with fixed and random effects)? Or would it have to be hand calculated?

I saw one post that stated to run the null model and then the full-model and to look at the variance components.

Any other thoughts, suggestions, or clarifications as to how to best calculate the r-squared when using proc mixed?

Thanks!

PaigeMiller · Posted 04-04-2020 07:34 AM

There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-04-2020 07:34 AM

There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.

--
Paige Miller

bnd · Posted 04-07-2020 08:51 PM

Hi @PaigeMiller

Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.

SteveDenham · Posted 04-08-2020 07:30 AM

Would you care to share that method with the rest of us?

Thanks in advance.

SteveDenham

PaigeMiller · Posted 04-08-2020 07:32 AM

@bnd wrote:

Hi @PaigeMiller

Thanks! There are many ways to compute the R-squared for multilevel models. I think I have found one that works well.

Yes, I agree with @SteveDenham, you need to explain what method you chose, and why, so we can all learn.

--
Paige Miller

bnd · Posted 04-09-2020 10:47 PM

Hi @SteveDenham and @PaigeMiller,

I chose this formula, R-squared = 1 - SSE_Model / SSE_IntOnly. SSE represents the sum of squared residuals from the model and SSE_IntOnly represents the sum of squared residuals from the intercept-only model. I chose this model because I was looking for a simple and less complicated formula to calculate the percent reduction in variance from the null model to the full model. I used the covariance parameter estimates table from proc mixed to calculate the R-squared. http://math.usu.edu/jrstevens/stat5200/25.Rsquare_Design.pdf

I am not sure if I have explained this well! I am very new to calculating the R-squared for multilevel models. I am not sure if this approach is the best or if R-squared should even be calculated this way, but it was a simple formula for me.

I also found this formula, R-squared = SSR/CTSS, where the SSR is the reduction sums of squares due to the model over and above the mean and the CTSS is the corrected total sum of squares. I got the same percent reductions using this formula. http://animsci.agrenv.mcgill.ca/StatisticalMethodsII/drvpseudor.pdf

SteveDenham · Posted 04-10-2020 10:59 AM

While this works, remind yourself over and over that the sums of squares in a mixed model are NOT what is optimized. It is a maximum likelihood method, and only in the fully balanced design with uncorrelated errors would the sums of squares be the same. A good substitute might be to look at the AIC values and determine the amount of information retained from the null model in the fit model. You could even put this on a relative basis. See the Wikipedia article on Akaike Information Criterion https://en.wikipedia.org/wiki/Akaike_information_criterion , which is a very good summary and points out how to compare models and the caveats involved.

SteveDenham

bnd · Posted 04-10-2020 07:19 PM

Hi @SteveDenham,

I actually looked at the AIC as well! Maybe I should just focus on the AIC instead of the pseudo-R-squared because as you have stated the sum of squares is not what is being optimized in mixed models.

Thanks!

Brittney

zjppdozen · Posted 04-29-2021 10:57 AM

Hi Steve,

Do you think likelihood ratio r-squared will be a better pseudo-r2 in mixed model, as described here https://www.ars.usda.gov/ARSUserFiles/80000000/SpatialWorkshop/19kramersupplrsq.pdf?

Also, the formula for likelihood ratio r-squared is Rlr = 1-exp(-2/n(LLM-LL0)). In the longitudinal data, do you know the "n" here should be the total person-year observations or just the total subjects included in the data set?

SteveDenham · Posted 04-30-2021 12:44 PM

The Kramer paper looks quite good, and I can see some utility in the MLE based pseudo-R2. However, you would have to be sure to change to an ML method from the standard REML methods used in MIXED and GLIMMIX, and that leads to biased estimates (as a simple example, compare the biased estimate of the variance (denominator=n) to the unbiased estimate (denominator = n-1), the proof that the biased estimate is the ML estimate is a pretty standard math stats course proof). I think we are still looking for an appropriate approach to goodness of fit for REML mixed models.

BTW, the n in the formula is the total number of observations.

SteveDenham

Statistical Procedures

Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Re: Proc Mixed - R-Squared

Pseudo R-square for NLIN

What does R square mean in variable selection?

Testing simultaneous effect in PROC MIXED

Determining CV among field sites in Proc Mixed

PROC MIXED solutions for individuals without observations

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...