Hi all,
Is there any source or idea on how to perform optimal binning in SAS Base besides complicated algorithms that usually dont work?
I am looking for something straightforward or more simple version that is easy to understand.
Thank you in advance
What do you have to bin and how many bins do think you'll need? A brief description of input data and desired output is helpful.
well i have lets say two continuous variables against a binary response, number of bins optimal in accordance with what i have, although i prefer lets say max 4 or 5 so i able to control the estimate signs in the regression later on
If you know the ranges of interest in the continuous variables I would recommend a custom format for each variable. Most of the analysis procedures will use the formatted value either by default or can be set to use the format. If the ranges don't work quite the way you want then you just change the format definition and don't have to create new variables.
There's also the question of why bin entirely, rather than use the continuous variable?
If your in eMiner, consider running a tree and seeing where the cutoff's occur for those variables.
unfortunately i am not in EM otherwise i would use interactive binning to define myself based on event rate and gini, in general binning is preferable to handle rare levels and missing as well as outliers. Besides the relationship between the continuous and target is not always linear so it would lead to instable model cause the effects wouldnt be captured
yes i guess thats what optimal binning would do so yes i am looking for the cut off values for creating bins that would give me the highest gini ratio.
I was just wondering if there was any tested way in code so i could try out instead of depending on some descriptives and if then else clauses.
Basically thats what i need cause i can calcultate the gini myself once i have the groups formed.
any suggestions or example on the format ? i am not sure i follow
thnx
Read about dynamic formats and informats here: http://www.lexjansen.com/pharmasug/2005/posters/po06.pdf
Just one of many papers in the subject.
What are the complicated algorithms that are not working for you?
a lot you find on the net that are supposively creating optimal bins, also in data mining preparation tools their macros do not work as they should , anyways i will look into the paper you sent me , thank you for the reply
I know advanced constraints have to be tweaked sometimes to get the most out of Interactive Binning or enterprise miner. If you are not getting the results you expect, contact sas tech support http://support.sas.com/techsup/contact/. they usually reply very fast.
good luck!
yeas you are right, in EM and the interactive binning node i have spent a lot of times creating own splits casue the ones you get are not always so good , in most cases it doesnt create bins even there are opportunities for it, i guess i will create something in a macro form to change cut offs every time , cause i am afraid that something automated that you usually encounter means more number of bins then possible to support later on the regression cause its just looking to maximize the gini ratio.
Thank you for your time
That sounds like a CART or CHAID process with a single variable perhaps? Could you use the CHAID macro out there, you may need to contact
http://listserv.uga.edu/cgi-bin/wa?A2=ind1309C&L=sas-l&D=0&P=5029
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.