BookmarkSubscribeRSS Feed
Khokhaz
Calcite | Level 5

Hi , 

 

I have simple question in the content of the course the following definition was given to outlier and influential 

 

influential observation : is far away and influences slope of the line 

outlier : has large residual compared to other points. 

 

but in the quiz question the answer was an influential observation is BOTH ! ( attached ) 

 

I am kind of confused 

 

Thanks 

 

6 REPLIES 6
Reeza
Super User

An influential observation is an outlier as well.

An outlier, may not be an influential observation since it may not influence the slope of the line. 

 

Note that the first option says "can sometimes have a large residual". 

 

FYI - please rotate your screenshots to make it easier for us to read. You can also load an image directly into the forum, rather than as an attachment which will almost always decrease your response time. 

 

 

delete_influential.JPG

PaigeMiller
Diamond | Level 26

@Reeza wrote:

An influential observation is an outlier as well.

 


I don't think this is right. The point can be right on the line (hence not an outlier) an influence the slope, compared to the slope of the line that is re-fit without that data point. If these two slopes are vastly different, the point is influential even if it is not an outlier.

--
Paige Miller
PaigeMiller
Diamond | Level 26

An influential observation is far away from the middle of the data IN THE X DIRECTION and influences the slope of the line. (If it is far away in the X direction and does not influence the slope of the line, it is not influential)

 

An outlier is far away from the line IN THE Y DIRECTION

--
Paige Miller
Khokhaz
Calcite | Level 5

Then why the answer is C both. 

influential_observation_question.png

PaigeMiller
Diamond | Level 26

The answer c is wrong, in my opinion

 

Of course, this depends on how the word "unusual" is meant in part a, and that can be open to interpretation.

--
Paige Miller
Astounding
PROC Star

Some examples:

 

data test;

do x=1 to 100, 1000000;

   y=x;

   output;

end;

run;

 

The point at 1,000,000 for both x and y is an outlier.  However, it has no impact on the slope of the line.

 

data test2;

do x=1 to 100;

   y=x;

   output;

end;

x=100;

y=1;

output;

run;

 

Now the first 100 points have x=y and there is no residual based on those points only.  However, adding in the last point (x=100, y=1) is not an outlier.  Both x and y are within the range of all the other x and y values.  But it has tremendous impact over the slope.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3245 views
  • 10 likes
  • 4 in conversation