Solved: Re: RMSE values from proc mixed

stats2554 · Posted 07-06-2023 08:32 PM

I’m using proc mixed with treatment, time, treatment x time as factors, time as repeated. Comparisons using dunnetts between three treated groups to a control group (these are the comparisons that are reported). I need to produce a value for RMSE and I’m wondering the most efficient way to do this. Thanks!

StatsMan · Posted 07-10-2023 11:25 AM

With the REPEATED statement, you are partitioning the R matrix into variance and covariance parameters. Using TYPE=CS, there are two components. The CS component is the common covariance between all observations on a give subject. The RESIDUAL component is the diagonal enhancement to the R matrix. Try adding the R option to the REPEATED statement and you can see what the R matrix looks like for the first subject.

Given that you are partitioning the variance into components with the REPEATED statement, you no longer have a single term that is the equivalent of the MSE. If your data are balanced (same number of obs on each subject and for each level of your CLASS effects), then some would advocate reporting the sum of these two components as the MSE.

If the data are not balanced or if you are using a more complicated covariance structure on the REPEATED statement (some other value than CS on TYPE=), then you have partitioned the residual variance in a more complicated fashion and it may be difficult to impossible to come up with a term that corresponds to the MSE.

View solution in original post

jiltao · Posted 07-07-2023 09:26 AM

The concept of RMSE might not apply to models fit in PROC MIXED. What is your PROC MIXED program for your repeated measures data and what is the output for the covariance parameter estimates table?

Jill

stats2554 · Posted 07-07-2023 02:21 PM

This is my program and covariance parameter estimates. Is there a similar estimate?

proc mixed data=Result;
class subject period treatment time;
model result = baseline period subject treatment time treatment*time;
repeated time / type=CS subject=subject*period
lsmeans treatment / adjust=dunnett;

Covariance parameter estimates

Cov Par Subject Estimate

CS subject*period 3.7383

Residual 43.5428

sbxkoenk · Posted 07-07-2023 03:56 PM

How do I compare mixed models with proc glimmix?
https://communities.sas.com/t5/Statistical-Procedures/How-do-I-compare-mixed-models-with-proc-glimmi...

How to compare 2 mixed models with different Fixed Effects?
https://communities.sas.com/t5/Statistical-Procedures/How-to-compare-2-mixed-models-with-different-F...

Usage Note 37107: Comparing covariance structures in PROC MIXED
https://support.sas.com/kb/37/107.html

Koen

StatsMan · Posted 07-10-2023 11:25 AM

With the REPEATED statement, you are partitioning the R matrix into variance and covariance parameters. Using TYPE=CS, there are two components. The CS component is the common covariance between all observations on a give subject. The RESIDUAL component is the diagonal enhancement to the R matrix. Try adding the R option to the REPEATED statement and you can see what the R matrix looks like for the first subject.

Given that you are partitioning the variance into components with the REPEATED statement, you no longer have a single term that is the equivalent of the MSE. If your data are balanced (same number of obs on each subject and for each level of your CLASS effects), then some would advocate reporting the sum of these two components as the MSE.

If the data are not balanced or if you are using a more complicated covariance structure on the REPEATED statement (some other value than CS on TYPE=), then you have partitioned the residual variance in a more complicated fashion and it may be difficult to impossible to come up with a term that corresponds to the MSE.

SteveDenham · Posted 07-11-2023 08:49 AM

This is probably more for @StatsMan , @sbxkoenk and @jiltao than the OP. Could an RMSE-like value be calculated from the residuals obtained from using the OUTPRED= option from the MODEL statement? I can conceive of squaring the raw (unscaled) residuals (observed - predicted), summing them up and taking the square root. This seems logical, but if it were, there would be a lot more of it done, so I am obviously missing something. Anyone care to take a stab at this?

SteveDenham

jiltao · Posted 07-11-2023 11:17 AM

I think your way is computing a summary statistic (variance, standard deviation) for a set of values, rather than a model-based approach. The two are not the same.

Jill

SteveDenham · Posted 07-11-2023 12:41 PM

Thanks, @jiltao. That is likely the case, especially as I left out a step. After calculating the sum of squared deviations and before taking a square root, there needs to be a division by an appropriate degrees of freedom value. So we would have the deviations of the values predicted by the model from the observed values, each then squared, summed over all observations, divided by an appropriate, model-based, degrees of freedom value. That looks, at least to me, a lot like a variance due to things not in the model like unmodeled fixed or random effects. If the model was as simple as a mean, it would be the variance, wouldn't it? Taking the square root gives a standard deviation, or in the case of a general linear model, the RMSE.

Since the predicted values are empirical BLUPs, this calculated value is a measure of how closely the empirical BLUPs represent the variability in the raw data. I really need my copy of Graybill's Theory and Application of the Linear Model to refresh my BLUP knowledge, but it is in a box in the basement somewhere...

SteveDenham

Registration is open