## Weighted linear regression

Solved
Frequent Contributor
Posts: 91

# Weighted linear regression

I have data for which I did a regression and the White test for constant variance had a p value =0.0016 indicating heterosdedasticity of the variance.  See attached graph for residuals vs predicted value.  The data was normally distributed see atached distribution graph which had a p=0.79 for the Shapiro Wilk test.  These results indicate that I need to do a weighted regression.

proc reg;              /* weighted linear regression */

model y = x;

weight w;

In the literature I read, "

If however we know the noise variance σ 2 i at each measurement i, and set wi = 1/σ2 i , we get the heteroskedastic MLE, and recover efficiency."

My question is how do we know this weight value and based upon my data what would be an appropriate weight i.e., 1/y or something else?

Accepted Solutions
Solution
‎08-19-2016 02:52 PM
SAS Super FREQ
Posts: 3,547

## Re: Weighted linear regression

The optimal weight values are unknown.

I can think of three options, listed from easiest to most difficult:

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X.  This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

Since (1) is easy and is commonly done in practice, I would suggest that you start there.  Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data.

All Replies
Solution
‎08-19-2016 02:52 PM
SAS Super FREQ
Posts: 3,547

## Re: Weighted linear regression

The optimal weight values are unknown.

I can think of three options, listed from easiest to most difficult:

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X.  This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

Since (1) is easy and is commonly done in practice, I would suggest that you start there.  Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data.

Frequent Contributor
Posts: 91

## Re: Weighted linear regression

I will try each of the suggested options and see which works best.