Solved: Proc Logistic -- Include Intercept -- Yes or No?

NKormanik · Posted 07-20-2021 01:44 AM

I see that interpretation of variable coefficients is problematic with Logistic Regression. No simple matter, by far.

Wondering if it's advisable to hold off on including the intercept. Interpretation of which also is an issue.

Any thoughts greatly appreciated.

Nicholas Kormanik

PaigeMiller · Posted 07-20-2021 07:36 AM

There are only rare cases in modeling where leaving the intercept out is a good idea. Generally, the advice is to include the intercept, the model will fit better; and you leave the intercept out only with rock-solid justification.

An example:

You put a certain amount of liquid soap in a dish of water. Then you agitate the water and measure the suds created. It at first sounds like if you were to a regression of amount of suds compared to amount of liquid soap (and keeping the agitation constant), you might think no intercept is needed, as zero soap produces zero suds. WRONG! In the region of the data, the fit line is not sloping towards the origin, it probably has a different slope and doesn't go through the origin if you project it backwards to the origin. This fit in the region of data is a better fit to the data than a fit with no intercept, and it really shouldn't matter that if you extrapolate to zero suds you don't get the expected value, extrapolation shouldn't force a fit — and furthermore, if you really want a good fitting line through the origin, it probably shouldn't be linear and may not fit well elsewhere.

Also, please note: there is a difference between EMPIRICAL modeling, which all regressions are, and strives to fit the data well; and first principles modeling, based upon some scientific or other knowledge. In my opinion, it is very difficult to combine EMPIRICAL modeling and first principles modeling and achieve good fitting models based on both concepts. Even in the soap suds example, it's hard to achieve both goals. Logistic regression and all regressions only try to fit the existing data well; it has no other goal.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 07-20-2021 07:36 AM

There are only rare cases in modeling where leaving the intercept out is a good idea. Generally, the advice is to include the intercept, the model will fit better; and you leave the intercept out only with rock-solid justification.

An example:

You put a certain amount of liquid soap in a dish of water. Then you agitate the water and measure the suds created. It at first sounds like if you were to a regression of amount of suds compared to amount of liquid soap (and keeping the agitation constant), you might think no intercept is needed, as zero soap produces zero suds. WRONG! In the region of the data, the fit line is not sloping towards the origin, it probably has a different slope and doesn't go through the origin if you project it backwards to the origin. This fit in the region of data is a better fit to the data than a fit with no intercept, and it really shouldn't matter that if you extrapolate to zero suds you don't get the expected value, extrapolation shouldn't force a fit — and furthermore, if you really want a good fitting line through the origin, it probably shouldn't be linear and may not fit well elsewhere.

Also, please note: there is a difference between EMPIRICAL modeling, which all regressions are, and strives to fit the data well; and first principles modeling, based upon some scientific or other knowledge. In my opinion, it is very difficult to combine EMPIRICAL modeling and first principles modeling and achieve good fitting models based on both concepts. Even in the soap suds example, it's hard to achieve both goals. Logistic regression and all regressions only try to fit the existing data well; it has no other goal.

--
Paige Miller

NKormanik · Posted 07-23-2021 06:40 PM

So nicely explained, @PaigeMiller. Really appreciate it.

SteveDenham · Posted 07-20-2021 08:32 AM

If you are fitting only a single factor and using GLM parameterization, removing the intercept will give values for each level of the factor in the solution vector. If there is a second (or more) factor(s), this 'trick' doesn't help a bit. As @PaigeMiller says, you are far better off including the intercept. You can use LSMEANS to get response level values. SAS does all of the necessary combining of parameters to get the LSMEANS.

SteveDenham

PaigeMiller · Posted 07-20-2021 08:39 AM

@SteveDenham wrote:

If you are fitting only a single factor and using GLM parameterization, removing the intercept will give values for each level of the factor in the solution vector.

I assume this refers to class variables, in which case I agree. The original post did not indicate if the x-variables are class or continuous.

--
Paige Miller

Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

Re: Proc Logistic -- Include Intercept -- Yes or No?

The 2025 SAS Hackathon has begun!