Solved: Weighted linear regression

jacksonan123 · Posted 08-14-2016 12:22 PM

I have data for which I did a regression and the White test for constant variance had a p value =0.0016 indicating heterosdedasticity of the variance. See attached graph for residuals vs predicted value. The data was normally distributed see atached distribution graph which had a p=0.79 for the Shapiro Wilk test. These results indicate that I need to do a weighted regression.

proc reg; /* weighted linear regression */

model y = x;

weight w;

In the literature I read, "

If however we know the noise variance σ 2 i at each measurement i, and set wi = 1/σ2 i , we get the heteroskedastic MLE, and recover efficiency."

My question is how do we know this weight value and based upon my data what would be an appropriate weight i.e., 1/y or something else?

Rick_SAS · Posted 08-15-2016 10:19 AM

The optimal weight values are unknown.

I can think of three options, listed from easiest to most difficult:

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X. This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

Since (1) is easy and is commonly done in practice, I would suggest that you start there. Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data.

View solution in original post

Rick_SAS · Posted 08-15-2016 10:19 AM

The optimal weight values are unknown.

I can think of three options, listed from easiest to most difficult:

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X. This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

Since (1) is easy and is commonly done in practice, I would suggest that you start there. Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data.

jacksonan123 · Posted 08-15-2016 11:01 AM

I will try each of the suggested options and see which works best.

Thanks for the advice.

Weighted linear regression

Re: Weighted linear regression

Re: Weighted linear regression

Re: Weighted linear regression