BookmarkSubscribeRSS Feed
chemicalab
Fluorite | Level 6

Hi all,

My question as in the topic header is aiming to understand which form of variable is best to use in logistic regression and why?

Should the variables be continuous? or should i be focusing with binning and categorical variables?

What are the pros and cons for each situation?

Thanks in advance

3 REPLIES 3
jwexler
SAS Employee

Hi Chemicalab.  It depends on the purpose of your model. If this is for a Credit Scorecard, then bins are often created when explaining why people were rejected.  Using Bins allows you to work with the assumptions of linear relationships, even if a non-linear relationship exists with your continuous variable. You may lose some ability to differentiate your continuous variables (is 21 different than 24 years old?), so you need to test your assumptions. I advise you to try both ways! Thanks, Jonathan

chemicalab
Fluorite | Level 6

Sounds like what i had in mind , thank you for the clarification Jonathan

fri0
Quartz | Level 8

I know that categorical variables help you because they may contain important information for your model, but regression is a technique for continuous variables preferably. If you need to use a main categorical variable for your project, you can do it, but you have to keep in mind that categorical variables should be minimum. If all your data is categorical, you maybe could think in use other techniques created specially for this kind of variable as ordinal regression for example.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1258 views
  • 0 likes
  • 3 in conversation