BookmarkSubscribeRSS Feed
jsconte18
Fluorite | Level 6
Can anyone please view my attached file of a picture of my graph and tell me if a linear model seems appropriate? I do see an upward and to the right progression but also think it is possible the point on the top right is an outlier and all the other points are forming an under U curved shape? Thank you!
7 REPLIES 7
ballardw
Super User

Suggestion: Make two linear models, one with and one without that possible outlier. See what difference that one may make.

 

Fifteen points is bit small perhaps with the dispersion I see.

Have you done a correlation on the x y variables? With and without the "outlier"?

jsconte18
Fluorite | Level 6
This is an assignment question that we answer solely based off of the output provided to us. The question for this assignment does not contain any correleation so we have to answer whether or not a linear model seems appropriate just from this graph and PROC REG data which included:

overal model significant
R Square 0.8384
intercept p value 0.9148
temperature p value <.0001

She also attached the Cook's D plot which shows an influential point and also the QQ plot does not look good.

So, based off of this information, I have to say whether a linear model is appropriate.
Reeza
Super User
R Squared = correlation squared, therefore your correlation is 0.91564.

So you only provided us part of the information to answer the question but want us to answer it?
jsconte18
Fluorite | Level 6
I was mainly just seeking another opinion on the graph because I cannot tell if it is following a straight or curved fashion.
Reeza
Super User
Well then my answer is you have an outlier, therefore this graph alone is not sufficient to determine if a linear graph is appropriate and I would require further information to come to a conclusion.
StatDave
SAS Super FREQ
You can use PROC ROBUSTREG to see if there is evidence of outliers and you can use PROC GAMPL to fit a flexible spline model to see if the fit deviates significantly from linear. See the examples and discussion in the documentation of those procedures.
SteveDenham
Jade | Level 19

I doubt that the point at the top right is an outlier, but I would say that it is a very high influence/leverage point that will have a big influence on the fit/correlation and the estimates of the slope and intercept.  Be sure you know the difference.  If I had to pick a potential outlier, it would be the point at about temp=82, chirps ~ 150.  But the point here is that the appropriateness of a linear model is better determined by the shape of the QQ plot, and the location of the high leverage point.

 

Consider a plot where many points are down in the lower left-hand corner and a single point is in the upper right-hand corner.  If you fit a regression line to data like that, it is likely to be linear with a large R squared value, all due to a single point. So, in your case, if the Cook's D value for that extreme temperature/chirp number observation is greater than an F value with 1 and 14 df and a probability of 0.5.= 0.479, then it is probably distorting the fit to be "more linear". I wish you had been provided the DFFITS value and the studentized residual value in addition to Cook's D..

 

That still doesn't answer the question of whether a linear model is appropriate.  I would say that it is appropriate if that high point is biologically plausible, but may not be appropriate (can't tell without the data) if that point is removed from the analysis.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 578 views
  • 6 likes
  • 5 in conversation