Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

I see that interpretation of variable coefficients is problematic with Logistic Regression.  No simple matter, by far.

 

Wondering if it's advisable to hold off on including the intercept.  Interpretation of which also is an issue.

 

Any thoughts greatly appreciated.

 

Nicholas Kormanik

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

There are only rare cases in modeling where leaving the intercept out is a good idea. Generally, the advice is to include the intercept, the model will fit better; and you leave the intercept out only with rock-solid justification.

 

An example:

 

You put a certain amount of liquid soap in a dish of water. Then you agitate the water and measure the suds created. It at first sounds like if you were to a regression of amount of suds compared to amount of liquid soap (and keeping the agitation constant), you might think no intercept is needed, as zero soap produces zero suds. WRONG!  In the region of the data, the fit line is not sloping towards the origin, it probably has a different slope and doesn't go through the origin if you project it backwards to the origin. This fit in the region of data is a better fit to the data than a fit with no intercept, and it really shouldn't matter that if you extrapolate to zero suds you don't get the expected value, extrapolation shouldn't force a fit — and furthermore, if you really want a good fitting line through the origin, it probably shouldn't be linear and may not fit well elsewhere.

 

Also, please note: there is a difference between EMPIRICAL modeling, which all regressions are, and strives to fit the data well; and first principles modeling, based upon some scientific or other knowledge. In my opinion, it is very difficult to combine EMPIRICAL modeling and first principles modeling and achieve good fitting models based on both concepts. Even in the soap suds example, it's hard to achieve both goals. Logistic regression and all regressions only try to fit the existing data well; it has no other goal.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

There are only rare cases in modeling where leaving the intercept out is a good idea. Generally, the advice is to include the intercept, the model will fit better; and you leave the intercept out only with rock-solid justification.

 

An example:

 

You put a certain amount of liquid soap in a dish of water. Then you agitate the water and measure the suds created. It at first sounds like if you were to a regression of amount of suds compared to amount of liquid soap (and keeping the agitation constant), you might think no intercept is needed, as zero soap produces zero suds. WRONG!  In the region of the data, the fit line is not sloping towards the origin, it probably has a different slope and doesn't go through the origin if you project it backwards to the origin. This fit in the region of data is a better fit to the data than a fit with no intercept, and it really shouldn't matter that if you extrapolate to zero suds you don't get the expected value, extrapolation shouldn't force a fit — and furthermore, if you really want a good fitting line through the origin, it probably shouldn't be linear and may not fit well elsewhere.

 

Also, please note: there is a difference between EMPIRICAL modeling, which all regressions are, and strives to fit the data well; and first principles modeling, based upon some scientific or other knowledge. In my opinion, it is very difficult to combine EMPIRICAL modeling and first principles modeling and achieve good fitting models based on both concepts. Even in the soap suds example, it's hard to achieve both goals. Logistic regression and all regressions only try to fit the existing data well; it has no other goal.

--
Paige Miller
NKormanik
Barite | Level 11

So nicely explained, @PaigeMiller.   Really appreciate it. 

 

SteveDenham
Jade | Level 19

If you are fitting only a single factor and using GLM parameterization, removing the intercept will give values for each level of the factor in the solution vector.  If there is a second (or more) factor(s), this 'trick' doesn't help a bit.  As @PaigeMiller says, you are far better off including the intercept.  You can use LSMEANS to get response level values.  SAS does all of the necessary combining of parameters to get the LSMEANS.

 

SteveDenham

PaigeMiller
Diamond | Level 26

@SteveDenham wrote:

If you are fitting only a single factor and using GLM parameterization, removing the intercept will give values for each level of the factor in the solution vector. 


I assume this refers to class variables, in which case I agree. The original post did not indicate if the x-variables are class or continuous.

--
Paige Miller

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1648 views
  • 9 likes
  • 3 in conversation