03-22-2013 05:44 PM
I'm trying to gain a greater understanding of how variance works in the output from PROC MIXED and hoping you can help!
I am using PROC MIXED to look at main effects, two-ways interactions and a three way interaction between three predictor variables on one dependent variable. My analyses has both random (subjects) and fixed effects (predictors), and is in the form of a nested, repeated-measures design. I am curious about if there is a way to determine the unique amount of variance in the dependent variable accounted for by each predictor. Also, I would like to know if there are any available indices of the common or shared variance between variables.
For example, for the first (single [no interaction]) predictor that I entered, when I am looking at the estimates in the solutions for fixed effects, does the estimate for that variable refer to the unique variance accounted for by that variable alone? If not, is there a way to get this information?
Thanks so much!
03-23-2013 04:21 PM
I don't think you can get what you want.
Even with the R-squared statistic in linear regression, the proportion of the variance in the dependent variable accounted for by a specific independent variable depends on the sample being used, on other independent variables in the model, and on how the model is specified. Several attempts have been made to develop measures of variable importance to compare variables. You should probably Google and read references to this concept of variable importance to see the subtleties and the potential pitfalls in trying to obtain what you are seeking.
03-27-2013 12:27 PM
Thanks so much for your reply.
However, I'm not sure I was totally clear in the question I asked.
You mentioned R-square change in your reply - an equivalent of R-square change would be ideal - is there anyway to get a metric like that from PROC MIXED? If not, I am just trying to understand what kind of variance is associated with the numbers provided by the solution for fixed effects. Let me see if I have this right. Let's say I had the following table for Solution for Fixed Effects including:
The estimate for predictor A is a beta value, denoting a standardized increase in A for every standard deviation increase of our dependent variable, holding all other predictors at zero. Does this tell us something about the amount of variance accounted for by that predictor? If I am understanding correctly, it does, but we cannot compare this to the total amount of variance available, the way that you could in multiple regression analysis. Is that correct?
Thanks so much!
03-27-2013 04:30 PM
The usual unstandardized regression coefficients estimated in PROC MIXED are not the standardized regression coefficients that you describe above, which are calculated by multiplying these usual regression coefficients by the ratio of the standard deviation of that independent variable and the standard deviation of the dependent variable. These standardized regression coefficients are the same as the usual regression coefficients when the dependent variable and all the independent variables are standardized and used in a linear regression. The largest reduction in R-squared does NOT necessarily occur with the removal of the independent variable with the largest standardized regression coefficient. Standardized regression coefficients have also been criticized for other reasons (see the reference, Bring J. How to standardize regression coefficients. The American Statistician 1994 Aug;48(3):209-213).
PROC MIXED does not include an R-squared measure in its output because this procedure is usually used with random-effect models or with mixed models rather than fixed-effect models where PROC GLM and PROC REG would be more appropriate. However, since PROC MIXED can write raw residuals to an output data set using the OUTP option of the model statement, which can be squared and summarized to form one part of the R-squared statistic. The sum of the squared difference between the observed dependent variable for each observation and the mean dependent variable across all observations forms the second part of the R-squared statistic. Thus,
the R-squared statistic can be calculated from the output of PROC MIXED.
The reference cited above describes how the t-statistic for a given independent variable (obtained by specifying the SOLUTION option in the PROC MIXED MODEL statement) is related to the increment in the R-squared statistic obtained by including that variable as the last independent variable in a model. Thus, comparing the t-statistics among independent variables in the same model is the same as comparing the decreases in the R-squared statisitc by excluding each of these variables from the model. Unfortunately, this comparison of independent variables yields an incremental R-squared, not exactly what you're interested in; but this comparison does emphasize that an independent variable's impact on the variability of the dependent variable is NOT unique but does depend on the other independent variables in the model.
03-27-2013 04:41 PM
If all of your model terms are orthogonal, then you can indeed determine the amount of variance accounted for by that model term. PROC GLM will do this.
If all of your model terms are NOT orthogonal, then there is no unique decomposition of amount of variance accounted for by that model term. This can also be seen in PROC GLM by examining Type I, Type II and Type III sum of squares (and of course, there are many other decompositions of the variance). In other words, in this situation, you cannot come up with an answer to the question.