I have a continuous variable like : INCOME or DURATION.
I need to split it into several groups by using several cutpoints.
two cutpoints would yield three groups, three would be four groups .............
Attachment is a test data/excel.
Data looks like this,I want to generate a GROUP variable: Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups. But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........ good_bad group duration good 1 2 bad 1 4 good 1 5 good 1 6 bad 1 8 good 1 10 good 2 18 good 2 28 bad 2 30 bad 2 32 total_n_bad=4 total_n_good=6 group=1 -------- n_bad=2 n_good=4 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=4/6=0.667 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048 group=2 -------- n_bad=2 n_good=2 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=2/6=0.333 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068 iv=0.048 + 0.068 = 0.116 <----- I want to maximize this iv . And I also have THREE constraints: group=1 -------- Bad_Dist>0.05 and Good_Dist>0.05 group=2 -------- Bad_Dist>0.05 and Good_Dist>0.05 to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero" woe[1]<woe[2]<woe[3]<woe[4]........... or woe[1]>woe[2]>woe[3]>woe[4]........... a.k.a woe is monotonic . P.S. The group could be 3,4,5,6,7,8,9,10..... and pick up the max IV from these group. E.X. group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).
Here is an example used by my GA code for the test data(attachment):
The OPTBINNING procedure solves this problem:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/casml/casml_optbinning_toc.htm
As the Overview section mentions, a shortest path approach is used under the hood, but I cannot share the details.
Here are a couple of related posts, each of which has links to others:
RobPratt,
Unfortunately, The OPTBINNING procedure is running under SAS/Viya , I don't have it. My sas is a stand-alone PC version.
And it looks like PROC OPTBINNING need set up some parameters(cutpoints ,woe,lower bound, upper bound, and WOE trend <-- what does it mean? ) firstly .
I need the code like your posted before for binning categroy variable .Like smoothing and flexibility.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →