Solved: MAX IV Value

Ksharp · Posted 02-20-2025 04:19 AM

I am doing a Score Card. I want to bin a category variable into 5 groups and get its Max IV value.

But I am runing a problem . How can I pass through this ERROR information ?

Thanks in advance .

The Excel file is attachement.

proc import datafile='D:\ifre_backup\Ksharp\1--German Credit.xlsx' dbms=xlsx out=have replace;
run;
%let var=purpose   ;  /*字符型*/
%let group=5 ;
data temp;
 set have;
 keep &var good_bad ;  /*good_bad 的值： good  bad */
run;
proc freq data=temp noprint;
table &var./out=level;
run;




proc optmodel;
set OBS;
str good_bad{OBS};
str &var.{OBS};
read data temp into OBS=[_n_] good_bad &var.;

set BYSET_INDEX;
str BYSET{BYSET_INDEX};
read data level into BYSET_INDEX=[_n_] BYSET=&var.;

set GROUP=1..&group.;
set SUB_BYSET_INDEX;
set<str> BY;
set OBS_BY;
num n_bad{GROUP};
num n_good{GROUP};
num bad_dist{GROUP};
num good_dist{GROUP};
num woe{GROUP};
num total_n_bad;
num total_n_good;

total_n_bad =sum{i in OBS} if good_bad[i]='bad'  then 1 else 0;
total_n_good=sum{i in OBS} if good_bad[i]='good' then 1 else 0;

var v{BYSET_INDEX} >=1 <=&group. integer;

cofor{g in GROUP} do;
 SUB_BYSET_INDEX={k in BYSET_INDEX:v[k]=g};
 BY=setof{i in SUB_BYSET_INDEX} BYSET[i];
 OBS_BY={i in OBS:&var.[i] in BY};

 n_bad[g] =sum{i in OBS_BY} if good_bad[i]='bad' then 1 else 0;
 n_good[g]=sum{i in OBS_BY} if good_bad[i]='good' then 1 else 0;
 bad_dist[g]=n_bad[g]/total_n_bad ; 
 good_dist[g]=n_good[g]/total_n_good ; 
 woe[g]=(bad_dist[g]-good_dist[g])*log(bad_dist[g]/good_dist[g]);
end;

max iv=sum{g in GROUP} woe[g];
solve;
quit;

RobPratt · Posted 02-22-2025 11:50 AM

As promised, here is the MILP approach I had in mind:

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   /* use CLP solver to enumerate all candidate groups */
   var IsPurpose {PURPOSES} binary;
   con Bad_dist_lb:
      sum {p in PURPOSES} bad[p] * IsPurpose[p] >= 0.05 * total_n_bad;
   con Good_dist_lb:
      sum {p in PURPOSES} good[p] * IsPurpose[p] >= 0.05 * total_n_good;

   solve with clp / findallsolns;
   set <str> PURPOSES_THIS;
   set GROUPS init {};
   set <str> PURPOSES_g {GROUPS};
   set GROUPS_p {PURPOSES} init {};
   num n_bad  {GROUPS};
   num n_good {GROUPS};
   num bad_dist  {g in GROUPS} = n_bad[g]  / total_n_bad;
   num good_dist {g in GROUPS} = n_good[g] / total_n_good;
   num woe {g in GROUPS} = (bad_dist[g] - good_dist[g]) * log(bad_dist[g]/good_dist[g]);
   for {g in 1.._NSOL_} do;
      PURPOSES_THIS = {p in PURPOSES: IsPurpose[p].sol[g] > 0.5};
      GROUPS = GROUPS union {g};
      PURPOSES_g[g] = PURPOSES_THIS;
      for {p in PURPOSES_THIS} GROUPS_p[p] = GROUPS_p[p] union {g};
      n_bad[g]  = sum {p in PURPOSES_THIS} bad[p];
      n_good[g] = sum {p in PURPOSES_THIS} good[p];
   end;
/*   print n_bad n_good bad_dist good_dist woe;*/

   /* use MILP solver to partition purposes into groups */
   var IsGroup {GROUPS} binary;
   max IV = sum {g in GROUPS} woe[g] * IsGroup[g];
   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS_p[p]} IsGroup[g] = 1;

   problem PartitionProblem include
      IsGroup IV OneGroupPerPurpose;
   use problem PartitionProblem;
   solve;

   num assignedGroup {PURPOSES};
   num count init 0;
   for {g in GROUPS: IsGroup[g].sol > 0.5} do;
      count = count + 1;
      for {p in PURPOSES_g[g]} assignedGroup[p] = count;
   end;
   print assignedGroup;
quit;

For the German credit data, the resulting (globally optimal) solution is slightly better than your GA solution:

Solution Summary
Solver	MILP
Algorithm	Branch and Cut
Objective Function	IV
Solution Status	Optimal
Objective Value	0.1676461706

Relative Gap	0
Absolute Gap	0
Primal Infeasibility	0
Bound Infeasibility	0
Integer Infeasibility	0

Best Bound	0.1676461706
Nodes	1
Solutions Found	4
Iterations	24
Presolve Time	0.10
Solution Time	0.21

[1]	assignedGroup
0	3
1	2
2	4
3	6
4	4
5	1
6	5
8	2
9	1
X	5

The only difference is that purpose 5 is now in the same group as purpose 9.

View solution in original post

RobPratt · Posted 02-21-2025 05:40 PM

There are a few issues here:

A COFOR loop does not achieve any parallelism unless the body of the loop contains at least one SOLVE statement.
If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero in the argument of the LOG function and hence a missing value for woe[g].
Your objective function IV depends only on numeric parameters and not on decision variables.

It looks like you are trying to solve a separate optimization problem for each group. If so, I recommend first writing the code to solve only one group. If not, please provide an algebraic description of the optimization problem you want to solve.

Ksharp · Posted 02-21-2025 08:42 PM

OK. Take group=2 for example:

Data looks like this,I want to generate a GROUP variable:
good_bad  group  purpose
good        1     1
bad         1     1
good        1     2
good        1     2
bad         1     2
good        1     3

good        2     8
good        2     8
bad         2     X
bad         2     X

Notice: each of purpose have ONLY ONE group . 
        you could not include purpose=1 in both group=1 and group=2.




total_n_bad=4   total_n_good=6

group=1
--------
n_bad=2 n_good=4 
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=4/6=0.667
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048


group=2 
--------
n_bad=2 n_good=2
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=2/6=0.333
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068


iv=0.048 + 0.068 = 0.116   <----- I want to maximize this iv .



And I also have two constraints:
group=1
--------
Bad_Dist>0.05 and Good_Dist>0.05

group=2
--------
Bad_Dist>0.05 and Good_Dist>0.05

to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero"



P.S.
The group could be 3,4,5,6,7,8,9,10.....
and pick up the max IV from these group.
E.X.  group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

RobPratt · Posted 02-21-2025 11:09 PM

Here's a straightforward approach that uses the black-box solver:

data have;
   input good_bad $ purpose $;
   datalines;
good        1
bad         1
good        2
good        2
bad         2
good        3
good        8
good        8
bad         X
bad         X
;

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   num numGroups = 2;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

   solve with blackbox;

   print N_bad N_good Bad_dist Good_dist Woe;

   num assignedGroup {PURPOSES};
   for {p in PURPOSES} do;
      for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
         assignedGroup[p] = g;
         leave;
      end;
   end;

   print assignedGroup;
quit;

On my machine, this yields a maximum IV of 1.5796959506.

Uncomment the FIX statements to recover your sample solution with IV = 0.1155245301.

Please verify whether this solves your problem for a fixed value of numGroups. Then I can show you how to find the best numGroups.

Ksharp · Posted 02-22-2025 02:17 AM

RobPratt,

Many thanks. But I applied your code into my excel. I got this WARNNING.

proc import datafile='D:\ifre_backup\Ksharp\1--German Credit.xlsx' dbms=xlsx out=have replace;
run;

%let var=purpose   ;  /*字符型*/
%let group=6 ;

data have(rename=(&var.=purpose));
 set have;
 keep &var good_bad ;  /*good_bad 的值： good  bad */
run;


proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   num numGroups = &group.;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

   solve with blackbox;

   print N_bad N_good Bad_dist Good_dist Woe;

   num assignedGroup {PURPOSES};
   for {p in PURPOSES} do;
      for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
         assignedGroup[p] = g;
         leave;
      end;
   end;

   print assignedGroup;
quit;

NOTE: The problem has 60 variables (0 free, 0 fixed).
NOTE: The problem uses 30 implicit variables.
NOTE: The problem has 60 binary and 0 integer variables.
NOTE: The problem has 22 linear constraints (0 LE, 10 EQ, 12 GE, 0 range).
NOTE: The problem has 180 linear constraint coefficients.
NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range).
NOTE: The OPTMODEL presolver removed 0 variables, 0 linear constraints, and 0 nonlinear constraints.
NOTE: The black-box solver is using up to 4 threads.
NOTE: The black-box solver is using the EAGLS optimizer algorithm.
NOTE: The problem has 60 variables (60 integer, 0 continuous).
NOTE: The problem has 22 constraints (22 linear, 0 nonlinear).
NOTE: The problem has 1 user-defined functions.
NOTE: The deterministic parallel mode is enabled.
                         Best
        Iteration        Objective    Infeasibility    Evals     Time
                1       0.08127539       2.00000000      301        0
                2       0.08127539       2.00000000      363        0
                3       0.08127539       2.00000000      366        0
                4       0.08127539       2.00000000      372        0
                5       0.08127539       2.00000000      376        0
                6       0.08127539       2.00000000      381        0
                7       0.08127539       2.00000000      385        0
                8       0.08127539       2.00000000      389        0
                9       0.08127539       2.00000000      391        0
               10       0.08127539       2.00000000      393        0
               11       0.08127539       2.00000000      395        0
WARNING: The best solution found does not satisfy the feasibility tolerance.
NOTE: Failed.
64
65      print N_bad N_good Bad_dist Good_dist Woe;
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.

And If I used the Genetic Algorithm code of mine, I could get this: ( group=6 has the max IV)

RobPratt · Posted 02-22-2025 10:53 AM

The Infeasibility of 2 means that some constraints are violated. You might need to change some solver options, like increasing POPSIZE= or NABSFCONV=.

Here is code to call the black-box solver for different numGroups and return the best solution found:

proc optmodel printlevel=0;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

/*   num numGroups = 2;*/
   num numGroups;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

/*   solve with blackbox;*/

   num bestIV init -1;
   num bestNumGroups init .;
   num assignedGroup {PURPOSES};
   do numGroups = 2..card(PURPOSES);
      put numGroups=;
      solve with blackbox;
      if _solution_status_ ne 'FAILED' and bestIV < IV then do;
         print N_bad N_good Bad_dist Good_dist Woe;
         bestIV = IV;
         bestNumGroups = numGroups;
         for {p in PURPOSES} do;
            for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
               assignedGroup[p] = g;
               leave;
            end;
         end;
      end;
      put bestIV= bestNumGroups=;
   end;

   print bestIV bestNumGroups;
   print assignedGroup;
quit;

For the German credit data, the best solution found has numGroups = 6, but the objective value is slightly worse than from your GA:

[1]	N_bad	N_good	Bad_dist	Good_dist	Woe
1	89	145	0.29667	0.207143	0.0321570
2	18	94	0.06000	0.134286	0.0598464
3	31	43	0.10333	0.061429	0.0217940
4	42	77	0.14000	0.110000	0.0072349
5	58	123	0.19333	0.175714	0.0016836
6	62	218	0.20667	0.311429	0.0429590

bestIV	bestNumGroups
0.16567	6

[1]	assignedGroup
0	1
1	2
2	5
3	6
4	3
5	4
6	3
8	2
9	4
X	3

The black-box solver is not guaranteed to find a globally optimal (or even a feasible) solution. Now that I understand the problem that you want to solve, I have an idea to find a globally optimal solution by using the MILP solver instead and will share that later.

RobPratt · Posted 02-22-2025 11:50 AM

As promised, here is the MILP approach I had in mind:

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   /* use CLP solver to enumerate all candidate groups */
   var IsPurpose {PURPOSES} binary;
   con Bad_dist_lb:
      sum {p in PURPOSES} bad[p] * IsPurpose[p] >= 0.05 * total_n_bad;
   con Good_dist_lb:
      sum {p in PURPOSES} good[p] * IsPurpose[p] >= 0.05 * total_n_good;

   solve with clp / findallsolns;
   set <str> PURPOSES_THIS;
   set GROUPS init {};
   set <str> PURPOSES_g {GROUPS};
   set GROUPS_p {PURPOSES} init {};
   num n_bad  {GROUPS};
   num n_good {GROUPS};
   num bad_dist  {g in GROUPS} = n_bad[g]  / total_n_bad;
   num good_dist {g in GROUPS} = n_good[g] / total_n_good;
   num woe {g in GROUPS} = (bad_dist[g] - good_dist[g]) * log(bad_dist[g]/good_dist[g]);
   for {g in 1.._NSOL_} do;
      PURPOSES_THIS = {p in PURPOSES: IsPurpose[p].sol[g] > 0.5};
      GROUPS = GROUPS union {g};
      PURPOSES_g[g] = PURPOSES_THIS;
      for {p in PURPOSES_THIS} GROUPS_p[p] = GROUPS_p[p] union {g};
      n_bad[g]  = sum {p in PURPOSES_THIS} bad[p];
      n_good[g] = sum {p in PURPOSES_THIS} good[p];
   end;
/*   print n_bad n_good bad_dist good_dist woe;*/

   /* use MILP solver to partition purposes into groups */
   var IsGroup {GROUPS} binary;
   max IV = sum {g in GROUPS} woe[g] * IsGroup[g];
   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS_p[p]} IsGroup[g] = 1;

   problem PartitionProblem include
      IsGroup IV OneGroupPerPurpose;
   use problem PartitionProblem;
   solve;

   num assignedGroup {PURPOSES};
   num count init 0;
   for {g in GROUPS: IsGroup[g].sol > 0.5} do;
      count = count + 1;
      for {p in PURPOSES_g[g]} assignedGroup[p] = count;
   end;
   print assignedGroup;
quit;

For the German credit data, the resulting (globally optimal) solution is slightly better than your GA solution:

Solution Summary
Solver	MILP
Algorithm	Branch and Cut
Objective Function	IV
Solution Status	Optimal
Objective Value	0.1676461706

Relative Gap	0
Absolute Gap	0
Primal Infeasibility	0
Bound Infeasibility	0
Integer Infeasibility	0

Best Bound	0.1676461706
Nodes	1
Solutions Found	4
Iterations	24
Presolve Time	0.10
Solution Time	0.21

[1]	assignedGroup
0	3
1	2
2	4
3	6
4	4
5	1
6	5
8	2
9	1
X	5

The only difference is that purpose 5 is now in the same group as purpose 9.

Ksharp · Posted 02-23-2025 10:08 PM

@RobPratt ,
That is awesome . If you don't mind , I have another problem similar with this problem to solve.

This problem is for category variable, but I also need to do the same thing to continuous variable.

Here is algorthim:

Data looks like this,I want to generate a GROUP variable:
Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups.
But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........


good_bad  group  duration
good        1     2
bad         1     4
good        1     5
good        1     6
bad         1     8
good        1     10

good        2     18
good        2     28
bad         2     30
bad         2     32






total_n_bad=4   total_n_good=6

group=1
--------
n_bad=2 n_good=4 
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=4/6=0.667
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048


group=2 
--------
n_bad=2 n_good=2
bad_dist=n_bad/total_n_bad=2/4=0.5 
good_dist=n_good/total_n_good=2/6=0.333
woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068


iv=0.048 + 0.068 = 0.116   <----- I want to maximize this iv .



And I also have THREE constraints:
group=1
--------
Bad_Dist>0.05 and Good_Dist>0.05

group=2
--------
Bad_Dist>0.05 and Good_Dist>0.05

to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero"



woe[1]<woe[2]<woe[3]<woe[4]...........
or
woe[1]>woe[2]>woe[3]>woe[4]...........
a.k.a woe is monotonic .




P.S.
The group could be 3,4,5,6,7,8,9,10.....
and pick up the max IV from these group.
E.X.  group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

Here is an example used by my GA code:

RobPratt · Posted 02-24-2025 10:33 AM

Glad to help. Please start a new thread for your new question.

Ksharp · Posted 02-24-2025 08:58 PM

OK. Already start a brand-new thread at OR forum.

MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

Re: MAX IV Value

The 2025 SAS Hackathon has begun!