turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Understanding R square in PROC GLM with weight var...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 02:43 PM

I tried to use PROC GLM to fit a model without an intercept term, yet with a weight variable. The SAS lines would read like such: PROC GLM; Model _dependent variable_ = list of independent variables/noint; weight _weight variable_. From the model fitting output, I can see the usual statistics, SSE, MSE, and R square. Then I tried to calculate R square after outputing the actuals and fitted values. But I got a different R square value from the straightforward SAS output. To calculate R square, I used the simple formula: R square = 1 - (residual sum of squares/total sum of squares). Since there was a weight variable, for each observation, both squared terms were weighted by the weight variable before summing up, i.e., weight*(actual-fitted)^2 and weight*(actual - average of actuals)^2. Was there anything incorrect about the manual derivation for R square? Could anyone help clear it up? Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 02:59 PM

There is no need to guess. The SAS documentation includes a chapter that shows the basic statistics that are computed in regression procedures.

Your formulas for R-squared and SSE seem to match the formulas in the documentation. For the total sum of squares, did you use the weighted mean?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 03:50 PM

just tried to replace the average of the actuals with the average of the weighted actuals in the total sum of squares calculation. This time R square becomes much smaller and further away from the R square by SAS output.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-19-2016 04:02 PM

If you post your code (PROC REG + DATA step), someone might be able to assist.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-20-2016 08:57 AM

Here is how to reproduce the numbers. Since you didn't provide data, I will use the following model:

```
proc glm data=sashelp.class plots=none;
weight weight;
model height = age;
output out=Out Residual=Resid;
ods select OverallAnova FitStatistics;
quit;
```

As you say, the R-squared value should be formed by the values in the "Sum of Squares" column in the OverallANOVA table. The following DATA _NULL_ step verifies the calculation:

```
data _null_;
SS_Total = 43699.97089;
SS_Error = 16000.45958;
RSquared = 1 - SS_Error / SS_Total;
put RSquared=;
run;
```

OK, so we know that R-squared is correct. How can we verify the SS_Total and SS_Error calculation? Well, SS_Total doesn't even use the model, it is just the corrected sum of squares for the response variable. Calling PROC MEANS reproduces the SS_Total:

```
proc means data=Sashelp.class CSS;
weight weight;
var height;
run;
```

What about the SS_Error? Well, that's just the weighted sum of the residuals. I output the residuals into the OUT dataset. The following PROC MEANS verifies the SS_Error as the (uncorrected) weighted SS of the residuals:

```
proc means data=Out USS;
weight weight;
var Resid;
run;
```