09-11-2013 05:18 AM
My question as in the topic header is aiming to understand which form of variable is best to use in logistic regression and why?
Should the variables be continuous? or should i be focusing with binning and categorical variables?
What are the pros and cons for each situation?
Thanks in advance
09-12-2013 10:03 AM
Hi Chemicalab. It depends on the purpose of your model. If this is for a Credit Scorecard, then bins are often created when explaining why people were rejected. Using Bins allows you to work with the assumptions of linear relationships, even if a non-linear relationship exists with your continuous variable. You may lose some ability to differentiate your continuous variables (is 21 different than 24 years old?), so you need to test your assumptions. I advise you to try both ways! Thanks, Jonathan
10-04-2013 05:39 PM
I know that categorical variables help you because they may contain important information for your model, but regression is a technique for continuous variables preferably. If you need to use a main categorical variable for your project, you can do it, but you have to keep in mind that categorical variables should be minimum. If all your data is categorical, you maybe could think in use other techniques created specially for this kind of variable as ordinal regression for example.