Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Optimal binning

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 126
Accepted Solution

Optimal binning

Hi all,

Is there any source or idea on how to perform optimal binning in SAS Base besides complicated algorithms  that usually dont work?

I am looking for something straightforward or more simple version that is easy to understand.

Thank you in advance 


Accepted Solutions
Solution
‎09-27-2013 03:33 PM
Super Contributor
Posts: 336

Re: Optimal binning

cart or chaid are just decision tree algorithms. yes, you will find some groupings using decision trees. but I think wants an optimal grouping solution, not just a grouping. tree-based grouping is a good start though. although EM binning has that too, I think.

View solution in original post


All Replies
Super User
Posts: 10,516

Re: Optimal binning

What do you have to bin and how many bins do think you'll need? A brief description of input data and desired output is helpful.

Frequent Contributor
Posts: 126

Re: Optimal binning

well i have lets say two continuous variables against a binary response, number of bins optimal in accordance with what i have, although i prefer lets say max 4 or 5 so i able to control the estimate signs in the regression later on

Super User
Posts: 10,516

Re: Optimal binning

If you know the ranges of interest in the continuous variables I would recommend a custom format for each variable. Most of the analysis procedures will use the formatted value either by default or can be set to use the format. If the ranges don't work quite the way you want then you just change the format definition and don't have to create new variables.


Super User
Posts: 17,863

Re: Optimal binning

There's also the question of why bin entirely, rather than use the continuous variable?


If your in eMiner, consider running a tree and seeing where the cutoff's occur for those variables.

Frequent Contributor
Posts: 126

Re: Optimal binning

unfortunately i am not in EM otherwise i would use interactive binning to define myself based on event rate and gini, in general binning is preferable to handle rare levels and missing as well as outliers. Besides the relationship between the continuous and target is not always linear so it would lead to instable model cause the effects wouldnt be captured

Super User
Posts: 17,863

Re: Optimal binning

Ok, Are you looking for how to determine the cutoffs for the bins or how to implement said cutoffs?

If how to implement, suggestion of formats is the one I'd also recommend.

Frequent Contributor
Posts: 126

Re: Optimal binning

yes i guess thats what optimal binning would do so yes i am looking for the cut off values for creating bins that would give me the highest gini ratio.

I was just wondering if there was any tested way in code so i could try out instead of depending on some descriptives and if then else clauses.

Basically thats what i need cause i can calcultate the gini myself once i have the groups formed.

Frequent Contributor
Posts: 126

Re: Optimal binning

any suggestions or example  on the format ? i am not sure i follow

thnx

Super Contributor
Posts: 336

Re: Optimal binning

Read about dynamic formats and informats here: http://www.lexjansen.com/pharmasug/2005/posters/po06.pdf

Just one of many papers in the subject.

What are the complicated algorithms that are not working for you?

Frequent Contributor
Posts: 126

Re: Optimal binning

a lot you find on the net that are supposively creating optimal bins, also in data mining preparation tools their macros do not work as they should , anyways i will look into the paper you sent me , thank you for the reply

Super Contributor
Posts: 336

Re: Optimal binning

I know advanced constraints have to be tweaked sometimes to get the most out of Interactive Binning or enterprise miner. If you are not getting the results you expect, contact sas tech support http://support.sas.com/techsup/contact/. they usually reply very fast.

good luck!

Frequent Contributor
Posts: 126

Re: Optimal binning

yeas you are right, in EM and the interactive binning node i have spent a lot of times creating own splits casue the ones you get are not always so good , in most cases it doesnt create bins even there are opportunities for it, i guess i will create something in a macro form to change cut offs every time , cause i am afraid that something automated that you usually encounter means more number of bins then possible to support later on the regression cause its just looking to maximize the gini ratio.

Thank you for your time

Super User
Posts: 17,863

Re: Optimal binning

That sounds like a CART or CHAID process with a single variable perhaps? Could you use the CHAID macro out there, you may need to contact

http://listserv.uga.edu/cgi-bin/wa?A2=ind1309C&L=sas-l&D=0&P=5029

Solution
‎09-27-2013 03:33 PM
Super Contributor
Posts: 336

Re: Optimal binning

cart or chaid are just decision tree algorithms. yes, you will find some groupings using decision trees. but I think wants an optimal grouping solution, not just a grouping. tree-based grouping is a good start though. although EM binning has that too, I think.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 22 replies
  • 7535 views
  • 2 likes
  • 8 in conversation