BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
newboy1218
Quartz | Level 8

Hi.. I am using logistic regression to predict the probability of being a bad customer. I need to transform each independent variable to make sure it has a strong linear relationship with the log-odds of my target. One variable has a upper U-shape when I plot the variable value against the log-odds. How should I transform it so I won't over-predict the lower group and under-predict the upper group?]

 

I tried using square, but it did not give me a better/linear-er line.

 

 

Capture.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.

 

Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.

 

Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p). 

 

You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.

--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.

 

Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.

 

Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p). 

 

You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Out-of-the-box logistic regression assumes a linear relationship between X and logit(Y). I like @PaigeMiller 's suggestion of a polynomial; like him, I think the cubic might be a bit excessive, but a quadratic might work if you transformed X using sqrt or log. EDIT: by which I mean

 

model y = x x2;

 

where x2 is x*x, and x is centered and possibly transformed.

 

There are no assumptions about the distribution of X, so you can apply transformations that better match a linear model (remember that a polynomial model is a curvi-linear model). Plus a sqrt or log transformation will reduce the leverage of the larger X values, which is convenient.

 

Nice illustrative graph! Very useful. But I definitely would not turn X into two categorical groups, you lose too much information. When you think about reality, not everything increases or decreases monotonically--remember the three bears: some porridge is too cold, some is too hot, and some is just right. Some values are "optimal".

 

PaigeMiller
Diamond | Level 26

A quadratic doesn't really capture the relatively flat line part of the curve at the right of the diagram. A cubic might, but I am squeamish about recommending cubics in general. Which is why I feel that maybe a spline is the best method here (or maybe it isn't, you can't really know until you try).

--
Paige Miller
newboy1218
Quartz | Level 8

Thank you! Yeah maybe I will explore the function PROC TRANSREG, with MODEL statement LOGIT(p) = SPLINE(x).



 

Ksharp
Super User

Are you trying to build a credit score card ?

Bin the variable to make a linear relation with Y ?

 

 

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3099-2019.pdf

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3446 views
  • 2 likes
  • 4 in conversation