Re: Predictive Modeling Using Logistic Regression
Would it be possible to clarify why the presence of redundant inputs may increase the risk of overfitting (see page 3-34 of the course text)?
Hi @pvareschi
The presence of redundant variables results in a more complex model than needed, as it increase the number of predictors. Complex models typically suffer from overfitting as the risk to "learn" errors increase (redundant information, which can be noise and far from a real-world setting)
Best,
Hi @pvareschi
The presence of redundant variables results in a more complex model than needed, as it increase the number of predictors. Complex models typically suffer from overfitting as the risk to "learn" errors increase (redundant information, which can be noise and far from a real-world setting)
Best,
👍 Thank you!
Redundant variables also cause the regression coefficients to swing wildly in some cases, to the extent that they can wind up with the wrong sign. And this leads to unstable models, and coefficients that are not interpretable.
Or in somewhat more statistical terms, high correlation between the predictor variables inflates the variance of the coefficients, meaning the coefficients can vary widely from the true value.
The above holds true for most modeling techniques. It does not hold true for Partial Least Squares, which can be used in the presence of redundant variables and is much less susceptible to the above issues.
I highly recommend that you reduce redundancy among your predictor variables first before you deal with irrelevancy of the predictor variables to the target variable. Including redundant variables increases the risk of over-fitting because your model has become overly complex and might be too sensitive to the peculiarities in the sample and therefore will not generalize well to new data. The performance of the variable selection methods such as stepwise and backward will be compromised if you have a high degree of multicollinearity among your predictor variables.
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment
Ready to level-up your skills? Choose your own adventure.