Hi ,
I have simple question in the content of the course the following definition was given to outlier and influential
influential observation : is far away and influences slope of the line
outlier : has large residual compared to other points.
but in the quiz question the answer was an influential observation is BOTH ! ( attached )
I am kind of confused
Thanks
An influential observation is an outlier as well.
An outlier, may not be an influential observation since it may not influence the slope of the line.
Note that the first option says "can sometimes have a large residual".
FYI - please rotate your screenshots to make it easier for us to read. You can also load an image directly into the forum, rather than as an attachment which will almost always decrease your response time.
@Reeza wrote:
An influential observation is an outlier as well.
I don't think this is right. The point can be right on the line (hence not an outlier) an influence the slope, compared to the slope of the line that is re-fit without that data point. If these two slopes are vastly different, the point is influential even if it is not an outlier.
An influential observation is far away from the middle of the data IN THE X DIRECTION and influences the slope of the line. (If it is far away in the X direction and does not influence the slope of the line, it is not influential)
An outlier is far away from the line IN THE Y DIRECTION
Then why the answer is C both.
The answer c is wrong, in my opinion
Of course, this depends on how the word "unusual" is meant in part a, and that can be open to interpretation.
Some examples:
data test;
do x=1 to 100, 1000000;
y=x;
output;
end;
run;
The point at 1,000,000 for both x and y is an outlier. However, it has no impact on the slope of the line.
data test2;
do x=1 to 100;
y=x;
output;
end;
x=100;
y=1;
output;
run;
Now the first 100 points have x=y and there is no residual based on those points only. However, adding in the last point (x=100, y=1) is not an outlier. Both x and y are within the range of all the other x and y values. But it has tremendous impact over the slope.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.