The Infeasibility of 2 means that some constraints are violated. You might need to change some solver options, like increasing POPSIZE= or NABSFCONV=.
Here is code to call the black-box solver for different numGroups and return the best solution found:
proc optmodel printlevel=0;
set OBS;
str purpose {OBS};
str good_bad {OBS};
read data have into OBS=[_N_] purpose good_bad;
set <str> PURPOSES init {};
num bad {PURPOSES} init 0;
num good {PURPOSES} init 0;
str pThis, gbThis;
for {i in OBS} do;
pThis = purpose[i];
gbThis = good_bad[i];
PURPOSES = PURPOSES union {pThis};
if gbThis = 'bad' then bad[pThis] = bad[pThis] + 1;
else if gbThis = 'good' then good[pThis] = good[pThis] + 1;
end;
print bad good;
num total_n_bad = sum {p in PURPOSES} bad[p];
num total_n_good = sum {p in PURPOSES} good[p];
put total_n_bad=;
put total_n_good=;
/* num numGroups = 2;*/
num numGroups;
set GROUPS = 1..numGroups;
var IsPurposeGroup {PURPOSES, GROUPS} binary;
impvar N_bad {g in GROUPS} = sum {p in PURPOSES} bad[p] * IsPurposeGroup[p,g];
impvar N_good {g in GROUPS} = sum {p in PURPOSES} good[p] * IsPurposeGroup[p,g];
impvar Bad_dist {g in GROUPS} = N_bad[g] / total_n_bad;
impvar Good_dist {g in GROUPS} = N_good[g] / total_n_good;
impvar Woe {g in GROUPS} = (Bad_dist[g] - Good_dist[g]) * log(Bad_dist[g]/Good_dist[g]);
max IV = sum {g in GROUPS} Woe[g];
con OneGroupPerPurpose {p in PURPOSES}:
sum {g in GROUPS} IsPurposeGroup[p,g] = 1;
con Bad_dist_lb {g in GROUPS}:
Bad_Dist[g] >= 0.05;
con Good_dist_lb {g in GROUPS}:
Good_Dist[g] >= 0.05;
/* for {p in {'1','2','3'}} fix IsPurposeGroup[p,1] = 1;*/
/* for {p in {'8','X'}} fix IsPurposeGroup[p,2] = 1;*/
/* solve with blackbox;*/
num bestIV init -1;
num bestNumGroups init .;
num assignedGroup {PURPOSES};
do numGroups = 2..card(PURPOSES);
put numGroups=;
solve with blackbox;
if _solution_status_ ne 'FAILED' and bestIV < IV then do;
print N_bad N_good Bad_dist Good_dist Woe;
bestIV = IV;
bestNumGroups = numGroups;
for {p in PURPOSES} do;
for {g in GROUPS: IsPurposeGroup[p,g].sol > 0.5} do;
assignedGroup[p] = g;
leave;
end;
end;
end;
put bestIV= bestNumGroups=;
end;
print bestIV bestNumGroups;
print assignedGroup;
quit;
For the German credit data, the best solution found has numGroups = 6, but the objective value is slightly worse than from your GA:
[1]
N_bad
N_good
Bad_dist
Good_dist
Woe
1
89
145
0.29667
0.207143
0.0321570
2
18
94
0.06000
0.134286
0.0598464
3
31
43
0.10333
0.061429
0.0217940
4
42
77
0.14000
0.110000
0.0072349
5
58
123
0.19333
0.175714
0.0016836
6
62
218
0.20667
0.311429
0.0429590
bestIV
bestNumGroups
0.16567
6
[1]
assignedGroup
0
1
1
2
2
5
3
6
4
3
5
4
6
3
8
2
9
4
X
3
The black-box solver is not guaranteed to find a globally optimal (or even a feasible) solution. Now that I understand the problem that you want to solve, I have an idea to find a globally optimal solution by using the MILP solver instead and will share that later.
... View more