BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CheerfulChu
Obsidian | Level 7

Dear Experts,

 

I read that high leverage point has lower variance. Why is it so? From the formula Var(ei) = s*sqrt(1-hi), I can understand. But I thought points at extreme end tends to influence the regression line alot. Hence intuitively, the variance should be higher.

 

Thank you

L

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic). 

 

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

 

In short, a high-leverage point (by definition) shrinks the residual value at that point.

View solution in original post

6 REPLIES 6
ballardw
Super User

Please site a source for the claim that a high leverage point has low variance. My basic stats tell me that no single point has variance as that is a statistic for a number of data points.

CheerfulChu
Obsidian | Level 7

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...

 

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

 

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.

ballardw
Super User

@CheerfulChu wrote:

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...

 

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

 

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.


Here's a brief of some of what they are talking about with that variance.

 

External studentization uses an estimate of $\mr{Var}[\widetilde{e}_ i]$ that does not involve the ith observation. Externally studentized residuals are often preferred over internally studentized residuals because they have well-known distributional properties in standard linear models for independent data.

 

So the studentized variance for Xi is for all of the other points EXCEPT the "high leverage point'. Which is why it is smaller.

 

 

PaigeMiller
Diamond | Level 26

Variance of what — of the predicted value? it would be nice if you made that clear.

 

Points at the extremes of the data don't necessarily influence the regression line a lot; they can, but they don't always do so.

--
Paige Miller
Rick_SAS
SAS Super FREQ

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic). 

 

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

 

In short, a high-leverage point (by definition) shrinks the residual value at that point.

CheerfulChu
Obsidian | Level 7
You are good! Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2767 views
  • 0 likes
  • 4 in conversation