Hi all,
Is there any source or idea on how to perform optimal binning in SAS Base besides complicated algorithms that usually dont work?
I am looking for something straightforward or more simple version that is easy to understand.
Thank you in advance
What do you have to bin and how many bins do think you'll need? A brief description of input data and desired output is helpful.
well i have lets say two continuous variables against a binary response, number of bins optimal in accordance with what i have, although i prefer lets say max 4 or 5 so i able to control the estimate signs in the regression later on
If you know the ranges of interest in the continuous variables I would recommend a custom format for each variable. Most of the analysis procedures will use the formatted value either by default or can be set to use the format. If the ranges don't work quite the way you want then you just change the format definition and don't have to create new variables.
There's also the question of why bin entirely, rather than use the continuous variable?
If your in eMiner, consider running a tree and seeing where the cutoff's occur for those variables.
unfortunately i am not in EM otherwise i would use interactive binning to define myself based on event rate and gini, in general binning is preferable to handle rare levels and missing as well as outliers. Besides the relationship between the continuous and target is not always linear so it would lead to instable model cause the effects wouldnt be captured
yes i guess thats what optimal binning would do so yes i am looking for the cut off values for creating bins that would give me the highest gini ratio.
I was just wondering if there was any tested way in code so i could try out instead of depending on some descriptives and if then else clauses.
Basically thats what i need cause i can calcultate the gini myself once i have the groups formed.
any suggestions or example on the format ? i am not sure i follow
thnx
Read about dynamic formats and informats here: http://www.lexjansen.com/pharmasug/2005/posters/po06.pdf
Just one of many papers in the subject.
What are the complicated algorithms that are not working for you?
a lot you find on the net that are supposively creating optimal bins, also in data mining preparation tools their macros do not work as they should , anyways i will look into the paper you sent me , thank you for the reply
I know advanced constraints have to be tweaked sometimes to get the most out of Interactive Binning or enterprise miner. If you are not getting the results you expect, contact sas tech support http://support.sas.com/techsup/contact/. they usually reply very fast.
good luck!
yeas you are right, in EM and the interactive binning node i have spent a lot of times creating own splits casue the ones you get are not always so good , in most cases it doesnt create bins even there are opportunities for it, i guess i will create something in a macro form to change cut offs every time , cause i am afraid that something automated that you usually encounter means more number of bins then possible to support later on the regression cause its just looking to maximize the gini ratio.
Thank you for your time
That sounds like a CART or CHAID process with a single variable perhaps? Could you use the CHAID macro out there, you may need to contact
http://listserv.uga.edu/cgi-bin/wa?A2=ind1309C&L=sas-l&D=0&P=5029
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.