BookmarkSubscribeRSS Feed
OneEyedKing
Fluorite | Level 6

I was taught that the normal plots (Q-Q and histogram) as well as the omnibus residual vs. predicted plot are more informative if you use the Studentized residual instead of the residual amounts. The underlying logic (as I understood at the time) is that most datasets are fairly small and given that the normal plots rely on z-scores which assume a very large dataset, it would be better to use Studentized residuals which are based on t-scores and incorporate SE information from the model.

Consequently, my question: when tweaking an OLS regression to get the best "fit", should I be guided by plots using residuals or Studentized residuals? Or are Studentized residual plots best used to find outliers, but should not be relied on to determine whether the specified model has a better "fit" to normal expectations?

I hope @Rick might have a nuanced opinion he can share with us all! 🙂 Thanks for reading!

5 REPLIES 5
PaigeMiller
Diamond | Level 26

@OneEyedKing wrote:

I was taught that the normal plots (Q-Q and histogram) as well as the omnibus residual vs. predicted plot are more informative if you use the Studentized residual instead of the residual amounts. The underlying logic (as I understood at the time) is that most datasets are fairly small and given that the normal plots rely on z-scores which assume a very large dataset, it would be better to use Studentized residuals which are based on t-scores and incorporate SE information from the model.


The data sets I work on are not small. So for my data sets, the statement "most datasets are fairly small" is false. But maybe your data sets are small. Can you please tell us the total number of observations in your data set?

 

Consequently, my question: when tweaking an OLS regression to get the best "fit", should I be guided by plots using residuals or Studentized residuals? Or are Studentized residual plots best used to find outliers, but should not be relied on to determine whether the specified model has a better "fit" to normal expectations?


What do you mean by "tweaking"? 

 

Residual plots help you determine whether or not you have outliers, and whether or not assumptions are violated (specifically the assumption that the errors are i.i.d. normally distributed). They can also show a violation of assumptions that the choice of a linear model is not the correct choice. Which of these are you using the residuals for? I am not sure I understand the underlying basics of your question.

 

I also question your subject line ... there is no such things as "Model Normalcy" as far as I know. (The model errors should be iid normally distributed, as I said above).

--
Paige Miller
OneEyedKing
Fluorite | Level 6

PaigeMiller--Thank you for your response. I apologize for the imprecision in my posting.

1) My comment of "most datasets are fairly small" relates back to when I was in graduate school, and most datasets we worked with were small. Currently, my most recent statistical estimates using OLS is based on a dataset of less than 550 observations, which I would still consider somewhat modest, thus my interest in using the most appropriate measure for the diagnostic plots.

2) Tweaking is as the Cambridge Dictionary defines it: "to change something slightly, especially in order to make it more correct, effective, or suitable." What I am referring to is using the various statistical graphical measures to evaluate the "fitness" of my model specification in terms of the assumption of normality of the residuals and then changing the specification to bring my model closer to the ideal. This is what I meant by "model normalcy" and it conforms with our shared expectation that OLS basically requires residuals to “be iid normally distributed.”

Rick_SAS
SAS Super FREQ

>  it conforms with our shared expectation that OLS basically requires residuals to “be iid normally distributed.”

Just to clarify, OLS does not depend on iid normal residual for prediction. That assumption is used for inferential statistics such as confidence limits for parameters. So "iid normal" is sometimes considered to be an optional assumption that is useful for some analyses.

For more on this topic, see On the assumptions (and misconceptions) of linear regression - The DO Loop

Rick_SAS
SAS Super FREQ

SAS regression procedures provide many diagnostic plots that you can use to assess the fit of the model and to identify potential outliers and influential observations.   See An overview of regression diagnostic plots in SAS - The DO Loop

In the article, I discuss two diagnostic plot that use studentized residuals. The most important feature of a studentized residual is that it is standardized. Therefore, you can use rules such as "If the magnitude of the studentized residual exceeds 2, you should examine the observation as a possible outlier."

 

> are Studentized residual plots best used to find outliers, but should not be relied on to determine whether the specified model has a better "fit" to normal expectations?

My opinion is that the studentized residual plots are best used to identify outliers. I usually use the raw residuals for Q-Q plots and histogram of residuals, but that is primarily because I like to see the residuals on the scale of the response variable, not on some standardized scale. Because the studentized residual for x[i] excludes x[i] from the computation, I have a hard time visualizing studentized residuals, whereas the raw residuals (y[i] - predicted_y[i]) are very easy to visualize and understand.

 

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 707 views
  • 1 like
  • 4 in conversation