How to find an Optimal(max) count in each Bin

Occasional Contributor
Posts: 6

How to find an Optimal(max) count in each Bin

I am trying to find Maximun count for each bin range, but bin range can move depending on data. Trying to find three bins and minumum three count in each bin. For example, for this sample

 ID fee 2 330 4 330 15 330 17 330 1 350 5 350 18 370 8 390 12 390 19 390 20 430 3 450 13 480 7 500 6 530 10 530 11 530 9 540 14 550 16 550

these are three different iterations and i want the best optimal solution, something like iteration 2.

 Bins count % Iteration 1 1 330-390 10 50.00% 2 391-530 7 35.00% 3 531-550 3 15.00% Iteration 2 Bins count % 1 330-450 12 60.00% 2 451-530 5 25.00% 3 531-550 3 15.00% Iteration 3 Bins count % 1 330-350 6 30.00% 2 351-530 11 55.00% 3 531-550 3 15.00%

Can we do this in auto process? if so what would be the best approach.

SAS Employee
Posts: 538

Re: How to find an Optimal(max) count in each Bin

[ Edited ]

I think the following code does what you want:

``````data indata;
input ID fee;
datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;

proc freq data=indata;
tables fee / out=freqdata;
run;

proc optmodel;
set ITEMS;
num count {ITEMS};
read data freqdata into ITEMS=[fee] count;
num total_count = sum {i in ITEMS} count[i];
set BINS = 1..3;

/* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
var Assign {ITEMS, BINS} binary;
/* assign each item to exactly one bin */
con AssignOnce {i in ITEMS}:
sum {b in BINS} Assign[i,b] = 1;

/* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];

/* smaller fee implies same or smaller bin */
con Monotonic {i in ITEMS, j in ITEMS: i < j}:
AssignedBin[i] <= AssignedBin[j];

/* minimize the maximum percentage */
var MinMax >= 0;
min Objective = MinMax;
var BinCount {BINS} >= 3;
con BinCountCon {b in BINS}:
BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
impvar BinCountPercentage {b in BINS} =
BinCount[b] / total_count;
con MinMaxCon {b in BINS}:
MinMax >= BinCountPercentage[b];

solve;
print BinCount BinCountPercentage;
print count AssignedBin;
quit;``````

[1] BinCount BinCountPercentage
1 6 0.30
2 7 0.35
3 7 0.35

[1] count AssignedBin
330 4 1
350 2 1
370 1 2
390 3 2
430 1 2
450 1 2
480 1 2
500 1 3
530 3 3
540 1 3
550 2 3

Occasional Contributor
Posts: 6

Re: How to find an Optimal(max) count in each Bin

thanks for such a quick response.

Super User
Posts: 10,610

Re: How to find an Optimal(max) count in each Bin

Here is GA code.

``````data indata;
infile cards truncover expandtabs;
input ID fee;
datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;
run;

proc iml;
use indata;
close ;

call tabulate(levels,freq,fee);
n=ncol(freq);

start function(x) global(freq,n);
x=unique(x);
if range(x) > 1 & x[<>]<=n & x[><]>=1 then do;
temp=sum(freq[1:x[1]])||
sum(freq[x[1]+1:x[2]-1])||
sum(freq[x[2]:n]);
obj=range(temp);
end;
else obj=9999999;
return (obj);
finish;

encoding=repeat(1//n,1,2);
id=gasetup(2,2,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,100,1,0.95);
call gainit(id,10000,encoding);

niter = 1000;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

members=char(levels[1])+'-'+char(levels[mem[1]])//
char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])//
char(levels[mem[2]])+'-'+char(levels[n]);
sum=sum(freq[1:mem[1]])//
sum(freq[mem[1]+1:mem[2]-1])//
sum(freq[mem[2]:n]);

print members[l="Members"],
sum[l="Group sum"],
value[l = "Min value"],levels,freq;
call gaend(id);
quit;

``````
Members
330- 350
370- 480
500- 550
Group sum
6
7
7
Min value
1
levels
330 350 370 390 430 450 480 500 530 540 550
freq
4 2 1 3 1 1 1 1 3 1 2
Occasional Contributor
Posts: 6

Re: How to find an Optimal(max) count in each Bin

thanks for the response.

Posts: 5,391

Re: How to find an Optimal(max) count in each Bin

Do the solutions reached by @RobPratt and @Ksharp match your needs? I wouldn't have guessed from your choice of Iteration 2 over Iterations 1 and 3.

PG
Occasional Contributor
Posts: 6

Re: How to find an Optimal(max) count in each Bin

i couldnt test them, since RobPratt used Proc optmodel and i do not have SAS/OR package and Ksharp used GA and somehow its giving me IML error and asking me to contact SAS support(this might be due to Version 9.3).

And also RobPratt Suggest me to use MILPSOLVE, so i am trying to use it. no luck yet.

What i am really looking is, trying to find a fee bucket range which has the highest frequency. so that i can assing the fee range to the client. in the example iteration 2 has bucket 330-450 with 60% .

SAS Employee
Posts: 538

Re: How to find an Optimal(max) count in each Bin

In an off-line private message, I understood the objective to be to minimize the maximum percentage.  Do you instead want to maximize the maximum percentage?

Occasional Contributor
Posts: 6

Re: How to find an Optimal(max) count in each Bin

yes i am trying to maximize the maximum percentage. sorry about the confusion

Posts: 5,391

Re: How to find an Optimal(max) count in each Bin

Then the problem is much simpler. There are only three solutions to consider:

Min-Min-Max
Min-Max-Min
Max-Min-Min

where Min is the smallest bin with a size greater than two and Max is whatever remains, apart from the Min's.

PG
SAS Employee
Posts: 538

Re: How to find an Optimal(max) count in each Bin

Here is a modification that maximizes the maximum percentage.  The same optimization model works if you change the number of bins or the minimum number of items per bin.

``````proc optmodel;
set ITEMS;
num count {ITEMS};
read data freqdata into ITEMS=[fee] count;
num total_count = sum {i in ITEMS} count[i];
set BINS = 1..3;

/* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
var Assign {ITEMS, BINS} binary;
/* assign each item to exactly one bin */
con AssignOnce {i in ITEMS}:
sum {b in BINS} Assign[i,b] = 1;

/* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];

/* smaller fee implies same or smaller bin */
con Monotonic {i in ITEMS, j in ITEMS: i < j}:
AssignedBin[i] <= AssignedBin[j];

/* maximize the maximum percentage */
var MaxBinCountPercentage >= 0;
max Objective = MaxBinCountPercentage;
var BinCount {BINS} >= 3;
con BinCountCon {b in BINS}:
BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
impvar BinCountPercentage {b in BINS} =
BinCount[b] / total_count;
con MaxCon1 {b in BINS}:
MaxBinCountPercentage >= BinCountPercentage[b];
var IsMaxBin {BINS} binary;
con OneMax:
sum {b in BINS} IsMaxBin[b] = 1;
/* if IsMaxBin[b] = 1 then MaxBinCountPercentage <= BinCountPercentage[b] */
con MaxCon2 {b in BINS}:
MaxBinCountPercentage - BinCountPercentage[b] <= 1 - IsMaxBin[b];

solve;
print BinCount BinCountPercentage;
print count AssignedBin;
quit;``````

[1] BinCount BinCountPercentage
1 14 0.70
2 3 0.15
3 3 0.15

[1] count AssignedBin
330 4 1
350 2 1
370 1 1
390 3 1
430 1 1
450 1 1
480 1 1
500 1 1
530 3 2
540 1 3
550 2 3
Posts: 5,391

Re: How to find an Optimal(max) count in each Bin

Implementing above idea in a datastep. Choosing between only three possible solutions, optimal solution is found directly.

``````data indata;
input ID fee;
datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;

proc sql;
create table tmpData as
select fee, count(fee) as count
from inData
group by fee
order by fee;
quit;

data maxBin;
array f {2000} _temporary_;
do i = 1 to nobs;
set tmpData nobs=nobs;
f{i} = count;
end;
do n1 = 1 to nobs until (s1 >= 3); s1 + f{n1}; end;
do n2 = n1+1 to nobs until (s2 >= 3); s2 + f{n2}; end;
do n4 = nobs to 1 by -1 until (s4 >= 3); s4 + f{n4}; end;
do n3 = n4-1 to 1 by -1 until (s3 >= 3); s3 + f{n3}; end;
sMin = min(s1+s2, s1+s4, s3+s4);
if s1+s2 = sMin then do; from = n2+1; to = nobs; end;
else if s1+s4 = sMin then do; from = n1+1; to = n4-1; end;
else do; from = 1; to = n3-1; end;
set tmpData point=from; fromFee = fee;
set tmpData point=to; toFee = fee;
output;
stop;
keep fromFee toFee;
run;

title "Bin with MAX obs";
proc print data=maxBin noobs; run;``````
PG
Occasional Contributor
Posts: 6

Re: How to find an Optimal(max) count in each Bin

can we print all three bins with count instead of just the max bin

Super User
Posts: 10,610

Re: How to find an Optimal(max) count in each Bin

[ Edited ]
```I don't understand you want MAXIMIZE  max(count)-min(count) ?

data indata;
infile cards truncover expandtabs;
input ID fee;
datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;
run;

proc iml;
use indata;
close ;

call tabulate(levels,freq,fee);
n=ncol(freq);

start function(x) global(freq,n);
x=unique(x);
if range(x) > 1 & x[<>]<=n & x[><]>=1 then do;
temp=sum(freq[1:x[1]])||
sum(freq[x[1]+1:x[2]-1])||
sum(freq[x[2]:n]);
obj=range(temp);
end;
else obj=-999999;
return (obj);
finish;

encoding=repeat(1//n,1,2);
id=gasetup(2,2,123456789);
call gasetobj(id,1,"function");
call gasetsel(id,100,1,0.95);
call gainit(id,10000,encoding);

niter = 1000;  /*Change it as big as you can*/
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

members=char(levels[1])+'-'+char(levels[mem[1]])//
char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])//
char(levels[mem[2]])+'-'+char(levels[n]);
sum=sum(freq[1:mem[1]])//
sum(freq[mem[1]+1:mem[2]-1])//
sum(freq[mem[2]:n]);

print members[l="Members"],
sum[l="Group sum"],
value[l = "Max value"],levels,freq;
call gaend(id);
quit;

OUTPUT:

Members
330- 530
540- 540
550- 550
Group sum
17
1
2
Max value
16
levels
330	350	370	390	430	450	480	500	530	540	550
freq
4	2	1	3	1	1	1	1	3	1	2

```
SAS Employee
Posts: 538

Re: How to find an Optimal(max) count in each Bin

The user wants to maximize max(count).  @Ksharp, you have not enforced the lower bound of 3 items per bin.

Discussion stats
• 14 replies
• 642 views
• 10 likes
• 4 in conversation