09-27-2013 09:06 AM
Is there any source or idea on how to perform optimal binning in SAS Base besides complicated algorithms that usually dont work?
I am looking for something straightforward or more simple version that is easy to understand.
Thank you in advance
09-27-2013 03:33 PM
09-27-2013 11:08 AM
well i have lets say two continuous variables against a binary response, number of bins optimal in accordance with what i have, although i prefer lets say max 4 or 5 so i able to control the estimate signs in the regression later on
09-27-2013 11:26 AM
If you know the ranges of interest in the continuous variables I would recommend a custom format for each variable. Most of the analysis procedures will use the formatted value either by default or can be set to use the format. If the ranges don't work quite the way you want then you just change the format definition and don't have to create new variables.
09-27-2013 11:29 AM
There's also the question of why bin entirely, rather than use the continuous variable?
If your in eMiner, consider running a tree and seeing where the cutoff's occur for those variables.
09-27-2013 11:47 AM
unfortunately i am not in EM otherwise i would use interactive binning to define myself based on event rate and gini, in general binning is preferable to handle rare levels and missing as well as outliers. Besides the relationship between the continuous and target is not always linear so it would lead to instable model cause the effects wouldnt be captured
09-27-2013 12:02 PM
yes i guess thats what optimal binning would do so yes i am looking for the cut off values for creating bins that would give me the highest gini ratio.
I was just wondering if there was any tested way in code so i could try out instead of depending on some descriptives and if then else clauses.
Basically thats what i need cause i can calcultate the gini myself once i have the groups formed.
09-27-2013 01:22 PM
Read about dynamic formats and informats here: http://www.lexjansen.com/pharmasug/2005/posters/po06.pdf
Just one of many papers in the subject.
What are the complicated algorithms that are not working for you?
09-27-2013 02:38 PM
a lot you find on the net that are supposively creating optimal bins, also in data mining preparation tools their macros do not work as they should , anyways i will look into the paper you sent me , thank you for the reply
09-27-2013 02:53 PM
I know advanced constraints have to be tweaked sometimes to get the most out of Interactive Binning or enterprise miner. If you are not getting the results you expect, contact sas tech support http://support.sas.com/techsup/contact/. they usually reply very fast.
09-27-2013 02:58 PM
yeas you are right, in EM and the interactive binning node i have spent a lot of times creating own splits casue the ones you get are not always so good , in most cases it doesnt create bins even there are opportunities for it, i guess i will create something in a macro form to change cut offs every time , cause i am afraid that something automated that you usually encounter means more number of bins then possible to support later on the regression cause its just looking to maximize the gini ratio.
Thank you for your time
09-27-2013 03:27 PM
09-27-2013 03:33 PM
Need further help from the community? Please ask a new question.