BookmarkSubscribeRSS Feed

The Why, How, and Cautions of Regression Without an Intercept

Started ‎09-16-2023 by
Modified ‎09-16-2023 by
Views 4,426

 

Sometimes when using a linear regression model, you might wish to remove the intercept from the model. This situation commonly occurs when you theoretically know that the model should go through the origin. Another way of stating that the model goes through the origin is that the intercept of the model should be zero. Forcing the intercept to be zero has some consequences, which is what this blog is about.

 

Removing the intercept from a model using SAS code is straight-forward and can be done in two common ways with PROC REG: using either the RESTRICT statement or the NOINT option.

 

To see this in action, here is some sample code that creates a small dataset. We are measuring the height and weight of items of various sizes. We will start by fitting the typical regression model with an intercept and examining those results.

 

data demo;
    input height weight;
    datalines;
24 170
31 197
29 194
42 242
58 305
53 289
;
title1 ‘Regression WITH the Intercept’;
proc reg data=demo plots=(fit(nolimits));
    model weight=height;
run;

 

Looking at the results of the model with an intercept, notice that the Analysis of Variance table has one degree of freedom for the model, 4 degrees of freedom for the error, and 5 degrees of freedom for the Corrected Total. This makes sense because we have two parameters estimated in the model (an intercept and a slope). These parameters can be seen in the parameter estimates table.

 

do_1_daober-Reg-Statment-Report.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

The statistical test that is performed by the Analysis of Variance table has a null hypothesis of the model equal to the mean of the response variable. Because the Pr>F is small, we would decide that the model is significant. The model is different than just using the overall mean of the response.

 

do_2_daober-Obs-vs-Pred-Plot.png

 

The plot of the model shows a pretty good fit to the data.

 

Now let’s fit a no intercept model. This model will be fit in two ways. First will be using the NOINT option of PROC REG. The second method will be to use the RESTRICT statement in PROC REG will be shown later. The code using NOINT would look like this.

 

title1 'Regression Using the NOINT Option';
proc reg data=demo plots=(fit(nolimits));
model weight=height / noint;
run;

 

Let’s explore the no intercept model created with the NOINT option first.

 

do_3_daober-No-Int-Option-Report.png

 

There is a note in the output that the model R-Square is redefined because this is a no-intercept model. R-Square is usually calculated as SS(Model) divided by SS(Total) where the SS(Total) is the corrected total sum of squares. But the ANOVA table shows an uncorrected total sum of squares is being used. So the R-Square is redefined to be SS(Model) divided by SS(Uncorrected Total). As a result, the interpretation of R-Square is no longer the proportion of variance explained and does not have a clear interpretation.

 

One must also be careful in interpreting the residuals from a no-intercept model. Because the residuals may not sum to zero, the residuals may not be evenly distributed around the zero line. Residual plots may look different than expected.

 

Finally, the plot of the model shows that this model does not fit the data very well. Why not? Clearly, the model should go through the origin as you cannot have weight without a height.

 

do_4_daober-No-Int-Obs-vs-Pred-Plot.png

 

There may be a few good reasons that this model does not fit the data very well. The most likely cause is that the model may not actually be linear. The lowest height value that we have is 24. We have no data to tell us what the relationship between height and weight is from 0 to 24. We probably should not assume the linear relationship seen with this data holds all the way to the origin. That is an extrapolation. Another possible cause is that the variance of the error terms may not be constant. Perhaps the error term grows as you get closer to the origin. That could lead to a large intercept that would still be within the bounds of the error term.

 

Now that we have fully explored the no-intercept regression model using the NOINT option, what do we see if we use the RESTRICT statement? Let’s submit this code.

 

title1 "Regression Using the RESTRICT Option";
proc reg data=demo plots=(fit(nolimits));
model weight=height;
restrict intercept=0;
run;

 

The results are almost the same as the NOINT option. However, the RESTRICT statement adds one additional line, the RESTRICT line, to the Parameter Estimates table. This line is testing the sensitivity of the sum of squared errors to the restriction. If the restriction estimate is small, then the restricted estimates are the same as the unrestricted estimates. The p-value that is provided indicates that the slope for height has a statistically significant difference from the unrestricted estimate. This might lead one to question if using the no-intercept model is really appropriate in this situation.

 

With all the cautions, why would you even fit a no-intercept model? There are still situations where you may need this capability. Perhaps you want to build your own custom design matrix that does not include the intercept column. Or maybe you are trying to compare two different measurement systems. You consider measurement system one as the independent variable and measurement system two as the response. Fitting a regression line with a slope of 1 and an intercept of 0 would provide an appropriate comparison of the two systems. One last possibility is if you need to fit a Scheffe mixture model which is not exactly a no-intercept model, but we will see more about this situation in my next blog post. These are just a few situations where you might want to fit a no-intercept regression model.

 

Ultimately, one should be very careful when using a no-intercept model as the typical regression interpretations no longer apply. In general, the safest course of action is to not use the no-intercept model. Interpretations with an intercept model will be more understandable. Even if the model does go through the origin, an intercept model should only have a small intercept due to sampling error and should be of no consequence. There is no drawback to using the intercept model.

Version history
Last update:
‎09-16-2023 09:05 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags