Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Transform a U-shape variable in Logistic Regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-30-2019 10:21 AM
(3464 views)

Hi.. I am using logistic regression to predict the probability of being a bad customer. I need to transform each independent variable to make sure it has a strong linear relationship with the log-odds of my target. One variable has a upper U-shape when I plot the variable value against the log-odds. How should I transform it so I won't over-predict the lower group and under-predict the upper group?]

I tried using square, but it did not give me a better/linear-er line.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.

Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.

Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).

You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.

--

Paige Miller

Paige Miller

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Applying a monotonic transformation (such as the square, or square root) to this variable will not eliminate the hump.

Applying a non-monotonic transformation (for example, a third order polynomial) can eliminate the hump, but may be questionable on subject matter grounds.

Nevertheless, you might try fitting a cubic equation (or third order polynomial), as it seems as if whatever curve fits well will do good job of predicting log(p/1-p).

You might also consider PROC TRANSREG, where you use in the MODEL statement LOGIT(p) = SPLINE(x). However, I will admit as to never having done this, and so I have no experience whatsoever with using PROC TRANSREG in this manner.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Out-of-the-box logistic regression assumes a linear relationship between X and logit(Y). I like @PaigeMiller 's suggestion of a polynomial; like him, I think the cubic might be a bit excessive, but a quadratic might work if you transformed X using sqrt or log. EDIT: by which I mean

model y = x x2;

where x2 is x*x, and x is centered and possibly transformed.

There are no assumptions about the distribution of X, so you can apply transformations that better match a linear model (remember that a polynomial model is a curvi-linear model). Plus a sqrt or log transformation will reduce the leverage of the larger X values, which is convenient.

Nice illustrative graph! Very useful. But I definitely would not turn X into two categorical groups, you lose too much information. When you think about reality, not everything increases or decreases monotonically--remember the three bears: some porridge is too cold, some is too hot, and some is just right. Some values are "optimal".

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A quadratic doesn't really capture the relatively flat line part of the curve at the right of the diagram. A cubic might, but I am squeamish about recommending cubics in general. Which is why I feel that maybe a spline is the best method here (or maybe it isn't, you can't really know until you try).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you trying to build a credit score card ?

Bin the variable to make a linear relation with Y ？

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3099-2019.pdf

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.