turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- high leverage point has lower variance

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Dear Experts,

I read that high leverage point has lower variance. Why is it so? From the formula Var(ei) = s*sqrt(1-hi), I can understand. But I thought points at extreme end tends to influence the regression line alot. Hence intuitively, the variance should be higher.

Thank you

L

Accepted Solutions

Solution

2 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

In short, a high-leverage point (by definition) shrinks the residual value at that point.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Please site a source for the claim that a high leverage point has low variance. My basic stats tell me that no single point has variance as that is a statistic for a number of data points.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors, *do not all have the same variance:* the variance decreases as the corresponding *x*-value gets farther from the average *x*-value ...

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

CheerfulChu wrote:

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors,

do not all have the same variance:the variance decreases as the correspondingx-value gets farther from the averagex-value ...

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.

Here's a brief of some of what they are talking about with that variance.

External studentization uses an estimate of that **does not involve the ith** observation. Externally studentized residuals are often preferred over internally studentized residuals because they have well-known distributional properties in standard linear models for independent data.

So the studentized variance for Xi is for all of the other points EXCEPT the "high leverage point'. Which is why it is smaller.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Variance of what — of the predicted value? it would be nice if you made that clear.

Points at the extremes of the data don't necessarily influence the regression line a lot; they can, but they don't always do so.

Solution

2 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

In short, a high-leverage point (by definition) shrinks the residual value at that point.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

You are good! Thanks