BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

@RobPratt 

I am doing a Score Card. I want to bin a category variable into 5 groups and get its Max IV value.

But I am runing a problem . How can I pass through this ERROR information ?

Thanks in advance .

The Excel file is attachement.

proc import datafile='D:\ifre_backup\Ksharp\1--German Credit.xlsx' dbms=xlsx out=have replace;
run;
%let var=purpose   ;  /*字符型*/
%let group=5 ;
data temp;
 set have;
 keep &var good_bad ;  /*good_bad 的值: good  bad */
run;
proc freq data=temp noprint;
table &var./out=level;
run;




proc optmodel;
set OBS;
str good_bad{OBS};
str &var.{OBS};
read data temp into OBS=[_n_] good_bad &var.;

set BYSET_INDEX;
str BYSET{BYSET_INDEX};
read data level into BYSET_INDEX=[_n_] BYSET=&var.;

set GROUP=1..&group.;
set SUB_BYSET_INDEX;
set<str> BY;
set OBS_BY;
num n_bad{GROUP};
num n_good{GROUP};
num bad_dist{GROUP};
num good_dist{GROUP};
num woe{GROUP};
num total_n_bad;
num total_n_good;

total_n_bad =sum{i in OBS} if good_bad[i]='bad'  then 1 else 0;
total_n_good=sum{i in OBS} if good_bad[i]='good' then 1 else 0;

var v{BYSET_INDEX} >=1 <=&group. integer;

cofor{g in GROUP} do;
 SUB_BYSET_INDEX={k in BYSET_INDEX:v[k]=g};
 BY=setof{i in SUB_BYSET_INDEX} BYSET[i];
 OBS_BY={i in OBS:&var.[i] in BY};

 n_bad[g] =sum{i in OBS_BY} if good_bad[i]='bad' then 1 else 0;
 n_good[g]=sum{i in OBS_BY} if good_bad[i]='good' then 1 else 0;
 bad_dist[g]=n_bad[g]/total_n_bad ; 
 good_dist[g]=n_good[g]/total_n_good ; 
 woe[g]=(bad_dist[g]-good_dist[g])*log(bad_dist[g]/good_dist[g]);
end;

max iv=sum{g in GROUP} woe[g];
solve;
quit;

Ksharp_0-1740043187809.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

As promised, here is the MILP approach I had in mind:

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   /* use CLP solver to enumerate all candidate groups */
   var IsPurpose {PURPOSES} binary;
   con Bad_dist_lb:
      sum {p in PURPOSES} bad[p] * IsPurpose[p] >= 0.05 * total_n_bad;
   con Good_dist_lb:
      sum {p in PURPOSES} good[p] * IsPurpose[p] >= 0.05 * total_n_good;

   solve with clp / findallsolns;
   set <str> PURPOSES_THIS;
   set GROUPS init {};
   set <str> PURPOSES_g {GROUPS};
   set GROUPS_p {PURPOSES} init {};
   num n_bad  {GROUPS};
   num n_good {GROUPS};
   num bad_dist  {g in GROUPS} = n_bad[g]  / total_n_bad;
   num good_dist {g in GROUPS} = n_good[g] / total_n_good;
   num woe {g in GROUPS} = (bad_dist[g] - good_dist[g]) * log(bad_dist[g]/good_dist[g]);
   for {g in 1.._NSOL_} do;
      PURPOSES_THIS = {p in PURPOSES: IsPurpose[p].sol[g] > 0.5};
      GROUPS = GROUPS union {g};
      PURPOSES_g[g] = PURPOSES_THIS;
      for {p in PURPOSES_THIS} GROUPS_p[p] = GROUPS_p[p] union {g};
      n_bad[g]  = sum {p in PURPOSES_THIS} bad[p];
      n_good[g] = sum {p in PURPOSES_THIS} good[p];
   end;
/*   print n_bad n_good bad_dist good_dist woe;*/

   /* use MILP solver to partition purposes into groups */
   var IsGroup {GROUPS} binary;
   max IV = sum {g in GROUPS} woe[g] * IsGroup[g];
   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS_p[p]} IsGroup[g] = 1;

   problem PartitionProblem include
      IsGroup IV OneGroupPerPurpose;
   use problem PartitionProblem;
   solve;

   num assignedGroup {PURPOSES};
   num count init 0;
   for {g in GROUPS: IsGroup[g].sol > 0.5} do;
      count = count + 1;
      for {p in PURPOSES_g[g]} assignedGroup[p] = count;
   end;
   print assignedGroup;
quit;

For the German credit data, the resulting (globally optimal) solution is slightly better than your GA solution:

Solution Summary
Solver MILP
Algorithm Branch and Cut
Objective Function IV
Solution Status Optimal
Objective Value 0.1676461706
   
Relative Gap 0
Absolute Gap 0
Primal Infeasibility 0
Bound Infeasibility 0
Integer Infeasibility 0
   
Best Bound 0.1676461706
Nodes 1
Solutions Found 4
Iterations 24
Presolve Time 0.10
Solution Time 0.21

[1] assignedGroup
0 3
1 2
2 4
3 6
4 4
5 1
6 5
8 2
9 1
X 5

 

The only difference is that purpose 5 is now in the same group as purpose 9.

View solution in original post

9 REPLIES 9
RobPratt
SAS Super FREQ

There are a few issues here:

  1. A COFOR loop does not achieve any parallelism unless the body of the loop contains at least one SOLVE statement.
  2. If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero in the argument of the LOG function and hence a missing value for woe[g].
  3. Your objective function IV depends only on numeric parameters and not on decision variables.

It looks like you are trying to solve a separate optimization problem for each group.  If so, I recommend first writing the code to solve only one group.  If not, please provide an algebraic description of the optimization problem you want to solve.

Ksharp
Super User

OK. Take group=2 for example:

Data looks like this,I want to generate a GROUP variable:
good_bad group purpose good 1 1 bad 1 1 good 1 2 good 1 2 bad 1 2 good 1 3 good 2 8 good 2 8 bad 2 X bad 2 X

Notice: each of purpose have ONLY ONE group .
you could not include purpose=1 in both group=1 and group=2. total_n_bad=4 total_n_good=6 group=1 -------- n_bad=2 n_good=4 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=4/6=0.667 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048 group=2 -------- n_bad=2 n_good=2 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=2/6=0.333 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068 iv=0.048 + 0.068 = 0.116 <----- I want to maximize this iv . And I also have two constraints: group=1 -------- Bad_Dist>0.05 and Good_Dist>0.05 group=2 -------- Bad_Dist>0.05 and Good_Dist>0.05 to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero" P.S. The group could be 3,4,5,6,7,8,9,10..... and pick up the max IV from these group. E.X. group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

 

RobPratt
SAS Super FREQ

Here's a straightforward approach that uses the black-box solver:

 

data have;
   input good_bad $ purpose $;
   datalines;
good        1
bad         1
good        2
good        2
bad         2
good        3
good        8
good        8
bad         X
bad         X
;

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   num numGroups = 2;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

   solve with blackbox;

   print N_bad N_good Bad_dist Good_dist Woe;

   num assignedGroup {PURPOSES};
   for {p in PURPOSES} do;
      for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
         assignedGroup[p] = g;
         leave;
      end;
   end;

   print assignedGroup;
quit;

 

On my machine, this yields a maximum IV of 1.5796959506.

 

Uncomment the FIX statements to recover your sample solution with IV = 0.1155245301.

 

Please verify whether this solves your problem for a fixed value of numGroups.  Then I can show you how to find the best numGroups.

Ksharp
Super User

RobPratt,

Many thanks. But I applied your code into my excel. I got this WARNNING.

proc import datafile='D:\ifre_backup\Ksharp\1--German Credit.xlsx' dbms=xlsx out=have replace;
run;

%let var=purpose   ;  /*字符型*/
%let group=6 ;

data have(rename=(&var.=purpose));
 set have;
 keep &var good_bad ;  /*good_bad 的值: good  bad */
run;


proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   num numGroups = &group.;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

   solve with blackbox;

   print N_bad N_good Bad_dist Good_dist Woe;

   num assignedGroup {PURPOSES};
   for {p in PURPOSES} do;
      for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
         assignedGroup[p] = g;
         leave;
      end;
   end;

   print assignedGroup;
quit;
NOTE: The problem has 60 variables (0 free, 0 fixed).
NOTE: The problem uses 30 implicit variables.
NOTE: The problem has 60 binary and 0 integer variables.
NOTE: The problem has 22 linear constraints (0 LE, 10 EQ, 12 GE, 0 range).
NOTE: The problem has 180 linear constraint coefficients.
NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range).
NOTE: The OPTMODEL presolver removed 0 variables, 0 linear constraints, and 0 nonlinear constraints.
NOTE: The black-box solver is using up to 4 threads.
NOTE: The black-box solver is using the EAGLS optimizer algorithm.
NOTE: The problem has 60 variables (60 integer, 0 continuous).
NOTE: The problem has 22 constraints (22 linear, 0 nonlinear).
NOTE: The problem has 1 user-defined functions.
NOTE: The deterministic parallel mode is enabled.
                         Best
        Iteration        Objective    Infeasibility    Evals     Time
                1       0.08127539       2.00000000      301        0
                2       0.08127539       2.00000000      363        0
                3       0.08127539       2.00000000      366        0
                4       0.08127539       2.00000000      372        0
                5       0.08127539       2.00000000      376        0
                6       0.08127539       2.00000000      381        0
                7       0.08127539       2.00000000      385        0
                8       0.08127539       2.00000000      389        0
                9       0.08127539       2.00000000      391        0
               10       0.08127539       2.00000000      393        0
               11       0.08127539       2.00000000      395        0
WARNING: The best solution found does not satisfy the feasibility tolerance.
NOTE: Failed.
64
65      print N_bad N_good Bad_dist Good_dist Woe;
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.
NOTE: Division by zero at line 47 column 77.

And If I used the Genetic Algorithm code of mine, I could get this: ( group=6  has the max IV)

Ksharp_0-1740208592653.png

 

RobPratt
SAS Super FREQ

The Infeasibility of 2 means that some constraints are violated.  You might need to change some solver options, like increasing POPSIZE= or NABSFCONV=.

 

Here is code to call the black-box solver for different numGroups and return the best solution found:

 

proc optmodel printlevel=0;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

/*   num numGroups = 2;*/
   num numGroups;
   set GROUPS = 1..numGroups;

   var IsPurposeGroup {PURPOSES, GROUPS} binary;

   impvar N_bad  {g in GROUPS} = sum {p in PURPOSES} bad[p]  * IsPurposeGroup[p,g];
   impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];

   impvar Bad_dist  {g in GROUPS} = N_bad[g]  / total_n_bad;
   impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;

   impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);

   max IV = sum {g in GROUPS} Woe[g];

   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS} IsPurposeGroup[p,g] = 1;

   con Bad_dist_lb {g in GROUPS}:
      Bad_Dist[g] >= 0.05;

   con Good_dist_lb {g in GROUPS}:
      Good_Dist[g] >= 0.05;

/*   for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/*   for {p in {'8','X'}}     fix IsPurposeGroup[p,2] = 1;*/

/*   solve with blackbox;*/

   num bestIV init -1;
   num bestNumGroups init .;
   num assignedGroup {PURPOSES};
   do numGroups = 2..card(PURPOSES);
      put numGroups=;
      solve with blackbox;
      if _solution_status_ ne 'FAILED' and bestIV < IV then do;
         print N_bad N_good Bad_dist Good_dist Woe;
         bestIV = IV;
         bestNumGroups = numGroups;
         for {p in PURPOSES} do;
            for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
               assignedGroup[p] = g;
               leave;
            end;
         end;
      end;
      put bestIV= bestNumGroups=;
   end;

   print bestIV bestNumGroups;
   print assignedGroup;
quit;

 

 

For the German credit data, the best solution found has numGroups = 6, but the objective value is slightly worse than from your GA:

[1] N_bad N_good Bad_dist Good_dist Woe
1 89 145 0.29667 0.207143 0.0321570
2 18 94 0.06000 0.134286 0.0598464
3 31 43 0.10333 0.061429 0.0217940
4 42 77 0.14000 0.110000 0.0072349
5 58 123 0.19333 0.175714 0.0016836
6 62 218 0.20667 0.311429 0.0429590

bestIV bestNumGroups
0.16567 6

[1] assignedGroup
0 1
1 2
2 5
3 6
4 3
5 4
6 3
8 2
9 4
X 3

 

The black-box solver is not guaranteed to find a globally optimal (or even a feasible) solution.  Now that I understand the problem that you want to solve, I have an idea to find a globally optimal solution by using the MILP solver instead and will share that later.

RobPratt
SAS Super FREQ

As promised, here is the MILP approach I had in mind:

proc optmodel;
   set OBS;
   str purpose {OBS};
   str good_bad {OBS};
   read data have into OBS=[_N_] purpose good_bad;
   set <str> PURPOSES init {};
   num bad  {PURPOSES} init 0;
   num good {PURPOSES} init 0;
   str pThis, gbThis;
   for {i in OBS} do;
      pThis = purpose[i];
      gbThis = good_bad[i];
      PURPOSES = PURPOSES union {pThis};
      if      gbThis = 'bad'  then bad[pThis]  = bad[pThis]  + 1;
      else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
   end;
   print bad good;
   num total_n_bad  = sum {p in PURPOSES} bad[p];
   num total_n_good = sum {p in PURPOSES} good[p];

   put total_n_bad=;
   put total_n_good=;

   /* use CLP solver to enumerate all candidate groups */
   var IsPurpose {PURPOSES} binary;
   con Bad_dist_lb:
      sum {p in PURPOSES} bad[p] * IsPurpose[p] >= 0.05 * total_n_bad;
   con Good_dist_lb:
      sum {p in PURPOSES} good[p] * IsPurpose[p] >= 0.05 * total_n_good;

   solve with clp / findallsolns;
   set <str> PURPOSES_THIS;
   set GROUPS init {};
   set <str> PURPOSES_g {GROUPS};
   set GROUPS_p {PURPOSES} init {};
   num n_bad  {GROUPS};
   num n_good {GROUPS};
   num bad_dist  {g in GROUPS} = n_bad[g]  / total_n_bad;
   num good_dist {g in GROUPS} = n_good[g] / total_n_good;
   num woe {g in GROUPS} = (bad_dist[g] - good_dist[g]) * log(bad_dist[g]/good_dist[g]);
   for {g in 1.._NSOL_} do;
      PURPOSES_THIS = {p in PURPOSES: IsPurpose[p].sol[g] > 0.5};
      GROUPS = GROUPS union {g};
      PURPOSES_g[g] = PURPOSES_THIS;
      for {p in PURPOSES_THIS} GROUPS_p[p] = GROUPS_p[p] union {g};
      n_bad[g]  = sum {p in PURPOSES_THIS} bad[p];
      n_good[g] = sum {p in PURPOSES_THIS} good[p];
   end;
/*   print n_bad n_good bad_dist good_dist woe;*/

   /* use MILP solver to partition purposes into groups */
   var IsGroup {GROUPS} binary;
   max IV = sum {g in GROUPS} woe[g] * IsGroup[g];
   con OneGroupPerPurpose {p in PURPOSES}:
      sum {g in GROUPS_p[p]} IsGroup[g] = 1;

   problem PartitionProblem include
      IsGroup IV OneGroupPerPurpose;
   use problem PartitionProblem;
   solve;

   num assignedGroup {PURPOSES};
   num count init 0;
   for {g in GROUPS: IsGroup[g].sol > 0.5} do;
      count = count + 1;
      for {p in PURPOSES_g[g]} assignedGroup[p] = count;
   end;
   print assignedGroup;
quit;

For the German credit data, the resulting (globally optimal) solution is slightly better than your GA solution:

Solution Summary
Solver MILP
Algorithm Branch and Cut
Objective Function IV
Solution Status Optimal
Objective Value 0.1676461706
   
Relative Gap 0
Absolute Gap 0
Primal Infeasibility 0
Bound Infeasibility 0
Integer Infeasibility 0
   
Best Bound 0.1676461706
Nodes 1
Solutions Found 4
Iterations 24
Presolve Time 0.10
Solution Time 0.21

[1] assignedGroup
0 3
1 2
2 4
3 6
4 4
5 1
6 5
8 2
9 1
X 5

 

The only difference is that purpose 5 is now in the same group as purpose 9.

Ksharp
Super User

@RobPratt ,
That is awesome . If you don't mind , I have another problem similar with this problem to solve.

This problem is for category variable, but I also need to do the same thing to continuous variable.

Here is algorthim:

 

Data looks like this,I want to generate a GROUP variable:
Here I have a cutpoint DURATION=12 ,that could split DURATION (a continuous variable) into TWO groups.
But if you have TWO cutpoints,you would yield THREE groups, THREE cutpoints yield FOUR groups........ good_bad group duration good 1 2 bad 1 4 good 1 5 good 1 6 bad 1 8 good 1 10 good 2 18 good 2 28 bad 2 30 bad 2 32 total_n_bad=4 total_n_good=6 group=1 -------- n_bad=2 n_good=4 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=4/6=0.667 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.667)*log(0.5/0.667)=0.048 group=2 -------- n_bad=2 n_good=2 bad_dist=n_bad/total_n_bad=2/4=0.5 good_dist=n_good/total_n_good=2/6=0.333 woe=(Bad_Dist-Good_Dist)*log(Bad_Dist/Good_Dist)=(0.5-0.333)*log(0.5/0.333)=0.068 iv=0.048 + 0.068 = 0.116 <----- I want to maximize this iv . And I also have THREE constraints: group=1 -------- Bad_Dist>0.05 and Good_Dist>0.05 group=2 -------- Bad_Dist>0.05 and Good_Dist>0.05 to avoid "If n_good[g] = 0, then good_dist[g] = 0, yielding a division by zero"


woe[1]<woe[2]<woe[3]<woe[4]...........
or
woe[1]>woe[2]>woe[3]>woe[4]...........
a.k.a woe is monotonic .

P.S. The group could be 3,4,5,6,7,8,9,10..... and pick up the max IV from these group. E.X. group=8 have the max IV when group in (2 3 4 5 6 7 8 9 10).

Here is an example used by my GA code:

Ksharp_0-1740366507290.png

 

 

RobPratt
SAS Super FREQ

Glad to help.  Please start a new thread for your new question.

Ksharp
Super User
OK. Already start a brand-new thread at OR forum.