- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi.. I am using logistic regression to predict the probability of being a bad customer. I need to transform each independent variable to make sure it has a strong linear relationship with the log-odds of my target. One variable has a upper U-shape when I plot the variable value against the log-odds. How should I transform it so I won't over-predict the lower group and under-predict the upper group?]
I tried using square, but it did not give me a better/linear-er line.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.
Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.
Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).
You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.
Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.
Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).
You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Out-of-the-box logistic regression assumes a linear relationship between X and logit(Y). I like @PaigeMiller 's suggestion of a polynomial; like him, I think the cubic might be a bit excessive, but a quadratic might work if you transformed X using sqrt or log. EDIT: by which I mean
model y = x x2;
where x2 is x*x, and x is centered and possibly transformed.
There are no assumptions about the distribution of X, so you can apply transformations that better match a linear model (remember that a polynomial model is a curvi-linear model). Plus a sqrt or log transformation will reduce the leverage of the larger X values, which is convenient.
Nice illustrative graph! Very useful. But I definitely would not turn X into two categorical groups, you lose too much information. When you think about reality, not everything increases or decreases monotonically--remember the three bears: some porridge is too cold, some is too hot, and some is just right. Some values are "optimal".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A quadratic doesn't really capture the relatively flat line part of the curve at the right of the diagram. A cubic might, but I am squeamish about recommending cubics in general. Which is why I feel that maybe a spline is the best method here (or maybe it isn't, you can't really know until you try).
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! Yeah maybe I will explore the function PROC TRANSREG, with MODEL statement LOGIT(p) = SPLINE(x).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you trying to build a credit score card ?
Bin the variable to make a linear relation with Y ?
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3099-2019.pdf