BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jacksonan123
Lapis Lazuli | Level 10

I have data for which I did a regression and the White test for constant variance had a p value =0.0016 indicating heterosdedasticity of the variance.  See attached graph for residuals vs predicted value.  The data was normally distributed see atached distribution graph which had a p=0.79 for the Shapiro Wilk test.  These results indicate that I need to do a weighted regression.

 

proc reg;              /* weighted linear regression */

   model y = x;

   weight w;

 

In the literature I read, " 

If however we know the noise variance σ 2 i at each measurement i, and set wi = 1/σ2 i , we get the heteroskedastic MLE, and recover efficiency."

 

My question is how do we know this weight value and based upon my data what would be an appropriate weight i.e., 1/y or something else?

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The optimal weight values are unknown.

 

I can think of three options, listed from easiest to most difficult: 

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X.  This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

 

Since (1) is easy and is commonly done in practice, I would suggest that you start there.  Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data. 

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

The optimal weight values are unknown.

 

I can think of three options, listed from easiest to most difficult: 

1) Transform the response variable by applying a variance-stabilizing transformation. A typical transformation is to define LogY = log(y) and then model LogY as a function of X.  This would require that Y > 0 for your response variable, but there are ways to handle negative values, too.

2) Use robust regression, especially M-estimation by using PROC ROBUSTREG, if you think that your response variable has been contaminated by outliers.

3) Implement iteratively reqeighted least squares regression by using PROC NLIN

 

Since (1) is easy and is commonly done in practice, I would suggest that you start there.  Other variance stabilizing transformations include sqrt(Y) and 1/Y. You should use the one that makes the most intuitive sense for your data. 

jacksonan123
Lapis Lazuli | Level 10

I will try each of the suggested options and see which works best.

 

Thanks for the advice.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1877 views
  • 0 likes
  • 2 in conversation