I have a continuous variable like : INCOME or DURATION.
I need to split it into several groups by using several cutpoints.
two cutpoints would yield three groups, three would be four groups .............
Attachment is a test data/excel.
Data looks like this,I want to generate a GROUP variable: Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups. But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........ good_bad group duration good 1 2 bad 1 4 good 1 5 good 1 6 bad 1 8 good 1 10 good 2 18 good 2 28 bad 2 30 bad 2 32 total_n_bad=4 total_n_good=6 group=1 -------- n_bad=2 n_good=4 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=4/6=0.667 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048 group=2 -------- n_bad=2 n_good=2 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=2/6=0.333 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068 iv=0.048 + 0.068 = 0.116 <----- I want to maximize this iv . And I also have THREE constraints: group=1 -------- Bad_Dist>0.05 and Good_Dist>0.05 group=2 -------- Bad_Dist>0.05 and Good_Dist>0.05 to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero" woe[1]<woe[2]<woe[3]<woe[4]........... or woe[1]>woe[2]>woe[3]>woe[4]........... a.k.a woe is monotonic . P.S. The group could be 3,4,5,6,7,8,9,10..... and pick up the max IV from these group. E.X. group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).
Here is an example used by my GA code for the test data(attachment):
The OPTBINNING procedure solves this problem:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/casml/casml_optbinning_toc.htm
As the Overview section mentions, a shortest path approach is used under the hood, but I cannot share the details.
Here are a couple of related posts, each of which has links to others:
RobPratt,
Unfortunately, The OPTBINNING procedure is running under SAS/Viya , I don't have it. My sas is a stand-alone PC version.
And it looks like PROC OPTBINNING need set up some parameters(cutpoints ,woe,lower bound, upper bound, and WOE trend <-- what does it mean? ) firstly .
I need the code like your posted before for binning categroy variable .Like smoothing and flexibility.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →