BookmarkSubscribeRSS Feed
Ksharp
Super User

I have a continuous variable like : INCOME or DURATION.

I need to split it into several groups by using several cutpoints.

two cutpoints would yield three groups, three would be four groups .............

Attachment is a test data/excel.

Data looks like this,I want to generate a GROUP variable:
Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups.
But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........


good_bad  group  duration
good        1     2
bad         1     4
good        1     5
good        1     6
bad         1     8
good        1     10

good        2     18
good        2     28
bad         2     30
bad         2     32






total_n_bad=4   total_n_good=6

group=1
--------
n_bad=2 n_good=4 
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=4/6=0.667
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048


group=2 
--------
n_bad=2 n_good=2
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=2/6=0.333
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068


iv=0.048 + 0.068 = 0.116   <----- I want to maximize this iv .



And I also have THREE constraints:
group=1
--------
Bad_Dist>0.05 and Good_Dist>0.05

group=2
--------
Bad_Dist>0.05 and Good_Dist>0.05

to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero"



woe[1]<woe[2]<woe[3]<woe[4]...........
or
woe[1]>woe[2]>woe[3]>woe[4]...........
a.k.a woe is monotonic .




P.S.
The group could be 3,4,5,6,7,8,9,10.....
and pick up the max IV from these group.
E.X.  group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

Here is an example used by my GA code for the test data(attachment):

 

Ksharp_0-1740448604991.png

 

2 REPLIES 2
RobPratt
SAS Super FREQ

The OPTBINNING procedure solves this problem:

https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/casml/casml_optbinning_toc.htm

 

As the Overview section mentions, a shortest path approach is used under the hood, but I cannot share the details.

 

Here are a couple of related posts, each of which has links to others:

Ksharp
Super User

RobPratt,

Unfortunately, The OPTBINNING procedure  is running under SAS/Viya , I don't have it. My sas is a stand-alone PC version.

 

And it looks like PROC OPTBINNING need set up some parameters(cutpoints ,woe,lower bound, upper bound, and WOE trend <-- what does it mean? ) firstly .

 

I need the code like your posted before for binning categroy variable .Like smoothing and flexibility.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

Discussion stats
  • 2 replies
  • 1148 views
  • 0 likes
  • 2 in conversation