New Contributor
Posts: 2

# Understanding R square in PROC GLM with weight variable

I tried to use PROC GLM to fit a model without an intercept term, yet with a weight variable. The SAS lines would read like such: PROC GLM; Model _dependent variable_ = list of independent variables/noint; weight _weight variable_. From the model fitting output, I can see the usual statistics, SSE, MSE, and R square. Then I tried to calculate R square after outputing the actuals and fitted values. But I got a different R square value from the straightforward SAS output. To calculate R square, I used the simple formula: R square = 1 - (residual sum of squares/total sum of squares). Since there was a weight variable, for each observation, both squared terms were weighted by the weight variable before summing up, i.e., weight*(actual-fitted)^2 and weight*(actual - average of actuals)^2. Was there anything incorrect about the manual derivation for R square? Could anyone help clear it up? Thanks!

SAS Super FREQ
Posts: 3,839

## Re: Understanding R square in PROC GLM with weight variable

There is no need to guess. The SAS documentation includes a chapter that shows the basic statistics that are computed in regression procedures.

Your formulas for R-squared and  SSE seem to match the formulas in the documentation. For the total sum of squares, did you use the weighted mean?

New Contributor
Posts: 2

## Re: Understanding R square in PROC GLM with weight variable

just tried to replace the average of the actuals with the average of the weighted actuals in the total sum of squares calculation. This time R square becomes much smaller and further away from the R square by SAS output.

SAS Super FREQ
Posts: 3,839

## Re: Understanding R square in PROC GLM with weight variable

If you post your code (PROC REG + DATA step), someone might be able to assist.

SAS Super FREQ
Posts: 3,839

## Re: Understanding R square in PROC GLM with weight variable

Here is how to reproduce the numbers. Since you didn't provide data, I will use the following model:

proc glm data=sashelp.class plots=none;
weight weight;
model height = age;
output out=Out Residual=Resid;
ods select OverallAnova FitStatistics;
quit;

As you say, the R-squared value should be formed by the values in the "Sum of Squares" column in the OverallANOVA table. The following DATA _NULL_ step verifies the calculation:

data _null_;
SS_Total = 43699.97089;
SS_Error = 16000.45958;
RSquared = 1 - SS_Error / SS_Total;
put RSquared=;
run;

OK, so we know that R-squared is correct. How can we verify the SS_Total and SS_Error calculation? Well, SS_Total doesn't even use the model, it is just the corrected sum of squares for the response variable. Calling PROC MEANS reproduces the SS_Total:

proc means data=Sashelp.class CSS;
weight weight;
var height;
run;

What about the SS_Error? Well, that's just the weighted sum of the residuals. I output the residuals into the OUT dataset. The following PROC MEANS verifies the SS_Error as the (uncorrected) weighted SS of the residuals:

proc means data=Out USS;
weight weight;
var Resid;
run;
Discussion stats
• 4 replies
• 341 views
• 1 like
• 2 in conversation