BookmarkSubscribeRSS Feed
Ksharp
Super User

I have a continuous variable like : INCOME or DURATION.

I need to split it into several groups by using several cutpoints.

two cutpoints would yield three groups, three would be four groups .............

Attachment is a test data/excel.

Data looks like this,I want to generate a GROUP variable:
Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups.
But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........


good_bad  group  duration
good        1     2
bad         1     4
good        1     5
good        1     6
bad         1     8
good        1     10

good        2     18
good        2     28
bad         2     30
bad         2     32






total_n_bad=4   total_n_good=6

group=1
--------
n_bad=2 n_good=4 
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=4/6=0.667
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048


group=2 
--------
n_bad=2 n_good=2
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=2/6=0.333
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068


iv=0.048 + 0.068 = 0.116   <----- I want to maximize this iv .



And I also have THREE constraints:
group=1
--------
Bad_Dist>0.05 and Good_Dist>0.05

group=2
--------
Bad_Dist>0.05 and Good_Dist>0.05

to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero"



woe[1]<woe[2]<woe[3]<woe[4]...........
or
woe[1]>woe[2]>woe[3]>woe[4]...........
a.k.a woe is monotonic .




P.S.
The group could be 3,4,5,6,7,8,9,10.....
and pick up the max IV from these group.
E.X.  group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

Here is an example used by my GA code for the test data(attachment):

 

Ksharp_0-1740448604991.png

 

2 REPLIES 2
RobPratt
SAS Super FREQ

The OPTBINNING procedure solves this problem:

https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/casml/casml_optbinning_toc.htm

 

As the Overview section mentions, a shortest path approach is used under the hood, but I cannot share the details.

 

Here are a couple of related posts, each of which has links to others:

Ksharp
Super User

RobPratt,

Unfortunately, The OPTBINNING procedure  is running under SAS/Viya , I don't have it. My sas is a stand-alone PC version.

 

And it looks like PROC OPTBINNING need set up some parameters(cutpoints ,woe,lower bound, upper bound, and WOE trend <-- what does it mean? ) firstly .

 

I need the code like your posted before for binning categroy variable .Like smoothing and flexibility.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Discussion stats
  • 2 replies
  • 963 views
  • 0 likes
  • 2 in conversation