BookmarkSubscribeRSS Feed
essence_0
Calcite | Level 5

I tried to use PROC GLM to fit a model without an intercept term, yet with a weight variable. The SAS lines would read like such: PROC GLM; Model _dependent variable_ = list of independent variables/noint; weight _weight variable_. From the model fitting output, I can see the usual statistics, SSE, MSE, and R square. Then I tried to calculate R square after outputing the actuals and fitted values. But I got a different R square value from the straightforward SAS output. To calculate R square, I used the simple formula: R square = 1 - (residual sum of squares/total sum of squares). Since there was a weight variable, for each observation, both squared terms were weighted by the weight variable before summing up, i.e., weight*(actual-fitted)^2 and weight*(actual - average of actuals)^2. Was there anything incorrect about the manual derivation for R square? Could anyone help clear it up? Thanks!

4 REPLIES 4
Rick_SAS
SAS Super FREQ

There is no need to guess. The SAS documentation includes a chapter that shows the basic statistics that are computed in regression procedures.

 

Your formulas for R-squared and  SSE seem to match the formulas in the documentation. For the total sum of squares, did you use the weighted mean?

essence_0
Calcite | Level 5

just tried to replace the average of the actuals with the average of the weighted actuals in the total sum of squares calculation. This time R square becomes much smaller and further away from the R square by SAS output.

Rick_SAS
SAS Super FREQ

If you post your code (PROC REG + DATA step), someone might be able to assist.

Rick_SAS
SAS Super FREQ

Here is how to reproduce the numbers. Since you didn't provide data, I will use the following model:

 

proc glm data=sashelp.class plots=none;
weight weight;
model height = age;
output out=Out Residual=Resid;
ods select OverallAnova FitStatistics;
quit;

As you say, the R-squared value should be formed by the values in the "Sum of Squares" column in the OverallANOVA table. The following DATA _NULL_ step verifies the calculation:

 

data _null_;
SS_Total = 43699.97089;
SS_Error = 16000.45958;
RSquared = 1 - SS_Error / SS_Total;
put RSquared=;
run;

OK, so we know that R-squared is correct. How can we verify the SS_Total and SS_Error calculation? Well, SS_Total doesn't even use the model, it is just the corrected sum of squares for the response variable. Calling PROC MEANS reproduces the SS_Total:

proc means data=Sashelp.class CSS;
weight weight;
var height;
run;

What about the SS_Error? Well, that's just the weighted sum of the residuals. I output the residuals into the OUT dataset. The following PROC MEANS verifies the SS_Error as the (uncorrected) weighted SS of the residuals:

proc means data=Out USS;
weight weight;
var Resid;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 7010 views
  • 1 like
  • 2 in conversation