Hi.. I am using logistic regression to predict the probability of being a bad customer. I need to transform each independent variable to make sure it has a strong linear relationship with the log-odds of my target. One variable has a upper U-shape when I plot the variable value against the log-odds. How should I transform it so I won't over-predict the lower group and under-predict the upper group?]
I tried using square, but it did not give me a better/linear-er line.
Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.
Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.
Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).
You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.
Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.
Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.
Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).
You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.
Out-of-the-box logistic regression assumes a linear relationship between X and logit(Y). I like @PaigeMiller 's suggestion of a polynomial; like him, I think the cubic might be a bit excessive, but a quadratic might work if you transformed X using sqrt or log. EDIT: by which I mean
model y = x x2;
where x2 is x*x, and x is centered and possibly transformed.
There are no assumptions about the distribution of X, so you can apply transformations that better match a linear model (remember that a polynomial model is a curvi-linear model). Plus a sqrt or log transformation will reduce the leverage of the larger X values, which is convenient.
Nice illustrative graph! Very useful. But I definitely would not turn X into two categorical groups, you lose too much information. When you think about reality, not everything increases or decreases monotonically--remember the three bears: some porridge is too cold, some is too hot, and some is just right. Some values are "optimal".
A quadratic doesn't really capture the relatively flat line part of the curve at the right of the diagram. A cubic might, but I am squeamish about recommending cubics in general. Which is why I feel that maybe a spline is the best method here (or maybe it isn't, you can't really know until you try).
Thank you! Yeah maybe I will explore the function PROC TRANSREG, with MODEL statement LOGIT(p) = SPLINE(x).
Are you trying to build a credit score card ?
Bin the variable to make a linear relation with Y ?
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3099-2019.pdf
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.