- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Experts,
I read that high leverage point has lower variance. Why is it so? From the formula Var(ei) = s*sqrt(1-hi), I can understand. But I thought points at extreme end tends to influence the regression line alot. Hence intuitively, the variance should be higher.
Thank you
L
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).
The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.
In short, a high-leverage point (by definition) shrinks the residual value at that point.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please site a source for the claim that a high leverage point has low variance. My basic stats tell me that no single point has variance as that is a statistic for a number of data points.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://en.wikipedia.org/wiki/Studentized_residual
>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...
http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf
>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.
Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@CheerfulChu wrote:
https://en.wikipedia.org/wiki/Studentized_residual
>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...
http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf
>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.
Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.
Here's a brief of some of what they are talking about with that variance.
External studentization uses an estimate of that does not involve the ith observation. Externally studentized residuals are often preferred over internally studentized residuals because they have well-known distributional properties in standard linear models for independent data.
So the studentized variance for Xi is for all of the other points EXCEPT the "high leverage point'. Which is why it is smaller.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Variance of what — of the predicted value? it would be nice if you made that clear.
Points at the extremes of the data don't necessarily influence the regression line a lot; they can, but they don't always do so.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).
The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.
In short, a high-leverage point (by definition) shrinks the residual value at that point.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content