I would pick up “ 4 categories” if they have same IV .
Due to more groups would keep more details of variable , i.e. would get more info about variable or less lossed info for the variable .
P.S. less group would lost more info about variable, that is the reason why statistical refuse to bin variable as @Rick_SAS said before. But For Score Card would better to explain model .
There's no such thing as a universally accepted method here, there's no such thing as "best", and while there might be theoretically such a thing as the model that has the highest "Gini", you may never find it, as there are too many possibilities so that you can't try them all.
Each step of the way produces too many choices/options that you can try, and so you can't realistically try them all. For example, each step requires decisions:
Recently, I was able to fit 12 different models, because in SAS Enterprise Miner or SAS Viya Model Studio you can do this relatively quickly. Once you learn the interface, the selection of different modeling methods and options goes relatively quickly. I was able to do this in about 2.5 hours (including creating the diagram, removing outliers, imputing values for missings, detecting and handling outliers, running all the models and then comparing them). I added two models using Logistic Partial Least Squares (which is not available in SAS). But ... although I fit 14 models, perhaps the 15th one that I didn't try would have been better. I will never know. It is impossible to know.
I wound up choosing simple outlier strategies, simple missing value strategies (I didn't do binning, but if you are going to, make a choice and go with it). But for all these decisions, select one or two methods and go with it. Don't try to model every possible choice of binning, outliers, missing and stepwise or other options.
To do the binning, you can try PROC HPBIN (or if you have Enterprise Miner or Model Studio, there is an equivalent node), but you have to select the proper method of binning and the proper options within that method.
I think (as opposed to the above advice about what modeling method to use that there is no universal agreement), there is almost universal agreement that you should NOT put all variables into the model. There needs to be some variable selection/reduction step, unless you use something like Stepwise or Logistic Partial Least Squares, in which case a separate variable selection step is not needed. Stepwise however has its own set of issues, if you search for "problems with Stepwise Regression" you will see what I mean.
I mentioned Enterprise Miner and Viya Model Studio. If you are going to program this yourself ... not recommended. You might as well block off the next three months to get all this programmed yourself, and plan to work through lunch and pull your hair out.
In a recent thread, someone else asked why is SAS still used and not Python or R. This is an example where SAS has major advantages over programming languages such as Python and R (not mentioned in that thread).
@Ronein wrote:
Is it better to have more categories or less categories ( if both provide similar IV). For example: grouping to 3 categories or 4 categories provide similar IV. Which option is better to use as explanatory variable in model ?
I don't think there is a general answer to this question. I think the answer depends on the data, and so you should try it different ways.
I would pick up “ 4 categories” if they have same IV .
Due to more groups would keep more details of variable , i.e. would get more info about variable or less lossed info for the variable .
P.S. less group would lost more info about variable, that is the reason why statistical refuse to bin variable as @Rick_SAS said before. But For Score Card would better to explain model .
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.