Statistical Procedures

CheerfulChu · Posted 07-25-2017 04:50 PM

Dear Experts,

I read that high leverage point has lower variance. Why is it so? From the formula Var(ei) = s*sqrt(1-hi), I can understand. But I thought points at extreme end tends to influence the regression line alot. Hence intuitively, the variance should be higher.

Thank you

L

Rick_SAS · Posted 07-26-2017 10:46 AM

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

In short, a high-leverage point (by definition) shrinks the residual value at that point.

View solution in original post

ballardw · Posted 07-25-2017 05:47 PM

Please site a source for the claim that a high leverage point has low variance. My basic stats tell me that no single point has variance as that is a statistic for a number of data points.

CheerfulChu · Posted 07-26-2017 10:33 AM

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.

ballardw · Posted 07-26-2017 11:07 AM

@CheerfulChu wrote:

https://en.wikipedia.org/wiki/Studentized_residual

>> ... the residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value ...

http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/20/lecture-20.pdf

>> pg 10. The bigger the leverage of i, the smaller the variance of the residual there.

Hence Leverage is the distance of xi away from average of x. So high leverage, the smaller the variance of residual. As I mentioned earlier, why is it so? Intuitively, points at extreme ends will move the regression line alot and so the variance should be larger compared to the points near x average.

Here's a brief of some of what they are talking about with that variance.

External studentization uses an estimate of $\mr{Var}[\widetilde{e}_ i]$ that does not involve the ith observation. Externally studentized residuals are often preferred over internally studentized residuals because they have well-known distributional properties in standard linear models for independent data.

So the studentized variance for Xi is for all of the other points EXCEPT the "high leverage point'. Which is why it is smaller.

PaigeMiller · Posted 07-25-2017 06:07 PM

Variance of what — of the predicted value? it would be nice if you made that clear.

Points at the extremes of the data don't necessarily influence the regression line a lot; they can, but they don't always do so.

--
Paige Miller

Rick_SAS · Posted 07-26-2017 10:46 AM

Thanks for the references. The confusion was that your post title indicates that the "high-leverage point has lower variance," but the point has zero variance (it is a data point, not a statistic).

The residual for an observation does have variance, which you could estimate by using a bootstrap. I think the answer to your question is that a high-leverage point "pulls up" the OLS regression line towards the y value at that point. Therefore the predicted mean at the high-leverage point is biased to be closer to the observed response at that point. Consequently, that residual will be biased to be small.

In short, a high-leverage point (by definition) shrinks the residual value at that point.

CheerfulChu · Posted 07-31-2017 06:45 PM

You are good! Thanks

Statistical Procedures

high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Re: high leverage point has lower variance

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...