Members |
---|
330- 350 |
370- 480 |
500- 550 |
I am trying to find Maximun count for each bin range, but bin range can move depending on data. Trying to find three bins and minumum three count in each bin. For example, for this sample
ID | fee |
2 | 330 |
4 | 330 |
15 | 330 |
17 | 330 |
1 | 350 |
5 | 350 |
18 | 370 |
8 | 390 |
12 | 390 |
19 | 390 |
20 | 430 |
3 | 450 |
13 | 480 |
7 | 500 |
6 | 530 |
10 | 530 |
11 | 530 |
9 | 540 |
14 | 550 |
16 | 550
|
these are three different iterations and i want the best optimal solution, something like iteration 2.
Bins | count | % | ||
Iteration 1 | 1 | 330-390 | 10 | 50.00% |
2 | 391-530 | 7 | 35.00% | |
3 | 531-550 | 3 | 15.00% | |
Iteration 2 | Bins | count | % | |
1 | 330-450 | 12 | 60.00% | |
2 | 451-530 | 5 | 25.00% | |
3 | 531-550 | 3 | 15.00% | |
Iteration 3 | Bins | count | % | |
1 | 330-350 | 6 | 30.00% | |
2 | 351-530 | 11 | 55.00% | |
3 | 531-550 | 3 | 15.00% |
Can we do this in auto process? if so what would be the best approach.
I think the following code does what you want:
data indata;
input ID fee;
datalines;
2 330
4 330
15 330
17 330
1 350
5 350
18 370
8 390
12 390
19 390
20 430
3 450
13 480
7 500
6 530
10 530
11 530
9 540
14 550
16 550
;
proc freq data=indata;
tables fee / out=freqdata;
run;
proc optmodel;
set ITEMS;
num count {ITEMS};
read data freqdata into ITEMS=[fee] count;
num total_count = sum {i in ITEMS} count[i];
set BINS = 1..3;
/* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
var Assign {ITEMS, BINS} binary;
/* assign each item to exactly one bin */
con AssignOnce {i in ITEMS}:
sum {b in BINS} Assign[i,b] = 1;
/* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];
/* smaller fee implies same or smaller bin */
con Monotonic {i in ITEMS, j in ITEMS: i < j}:
AssignedBin[i] <= AssignedBin[j];
/* minimize the maximum percentage */
var MinMax >= 0;
min Objective = MinMax;
var BinCount {BINS} >= 3;
con BinCountCon {b in BINS}:
BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
impvar BinCountPercentage {b in BINS} =
BinCount[b] / total_count;
con MinMaxCon {b in BINS}:
MinMax >= BinCountPercentage[b];
solve;
print BinCount BinCountPercentage;
print count AssignedBin;
quit;
[1] | BinCount | BinCountPercentage |
---|---|---|
1 | 6 | 0.30 |
2 | 7 | 0.35 |
3 | 7 | 0.35 |
[1] | count | AssignedBin |
---|---|---|
330 | 4 | 1 |
350 | 2 | 1 |
370 | 1 | 2 |
390 | 3 | 2 |
430 | 1 | 2 |
450 | 1 | 2 |
480 | 1 | 2 |
500 | 1 | 3 |
530 | 3 | 3 |
540 | 1 | 3 |
550 | 2 | 3 |
thanks for such a quick response.
Here is GA code.
data indata;
infile cards truncover expandtabs;
input ID fee;
datalines;
2 330
4 330
15 330
17 330
1 350
5 350
18 370
8 390
12 390
19 390
20 430
3 450
13 480
7 500
6 530
10 530
11 530
9 540
14 550
16 550
;
run;
proc iml;
use indata;
read all var {fee} ;
close ;
call tabulate(levels,freq,fee);
n=ncol(freq);
start function(x) global(freq,n);
x=unique(x);
if range(x) > 1 & x[<>]<=n & x[><]>=1 then do;
temp=sum(freq[1:x[1]])||
sum(freq[x[1]+1:x[2]-1])||
sum(freq[x[2]:n]);
obj=range(temp);
end;
else obj=9999999;
return (obj);
finish;
encoding=repeat(1//n,1,2);
id=gasetup(2,2,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,100,1,0.95);
call gainit(id,10000,encoding);
niter = 1000;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);
members=char(levels[1])+'-'+char(levels[mem[1]])//
char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])//
char(levels[mem[2]])+'-'+char(levels[n]);
sum=sum(freq[1:mem[1]])//
sum(freq[mem[1]+1:mem[2]-1])//
sum(freq[mem[2]:n]);
print members[l="Members"],
sum[l="Group sum"],
value[l = "Min value"],levels,freq;
call gaend(id);
quit;
Members |
---|
330- 350 |
370- 480 |
500- 550 |
Group sum |
---|
6 |
7 |
7 |
Min value |
---|
1 |
levels | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
330 | 350 | 370 | 390 | 430 | 450 | 480 | 500 | 530 | 540 | 550 |
freq | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
4 | 2 | 1 | 3 | 1 | 1 | 1 | 1 | 3 | 1 | 2 |
thanks for the response.
i couldnt test them, since RobPratt used Proc optmodel and i do not have SAS/OR package and Ksharp used GA and somehow its giving me IML error and asking me to contact SAS support(this might be due to Version 9.3).
And also RobPratt Suggest me to use MILPSOLVE, so i am trying to use it. no luck yet.
What i am really looking is, trying to find a fee bucket range which has the highest frequency. so that i can assing the fee range to the client. in the example iteration 2 has bucket 330-450 with 60% .
In an off-line private message, I understood the objective to be to minimize the maximum percentage. Do you instead want to maximize the maximum percentage?
yes i am trying to maximize the maximum percentage. sorry about the confusion
Then the problem is much simpler. There are only three solutions to consider:
Min-Min-Max
Min-Max-Min
Max-Min-Min
where Min is the smallest bin with a size greater than two and Max is whatever remains, apart from the Min's.
Here is a modification that maximizes the maximum percentage. The same optimization model works if you change the number of bins or the minimum number of items per bin.
proc optmodel;
set ITEMS;
num count {ITEMS};
read data freqdata into ITEMS=[fee] count;
num total_count = sum {i in ITEMS} count[i];
set BINS = 1..3;
/* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
var Assign {ITEMS, BINS} binary;
/* assign each item to exactly one bin */
con AssignOnce {i in ITEMS}:
sum {b in BINS} Assign[i,b] = 1;
/* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];
/* smaller fee implies same or smaller bin */
con Monotonic {i in ITEMS, j in ITEMS: i < j}:
AssignedBin[i] <= AssignedBin[j];
/* maximize the maximum percentage */
var MaxBinCountPercentage >= 0;
max Objective = MaxBinCountPercentage;
var BinCount {BINS} >= 3;
con BinCountCon {b in BINS}:
BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
impvar BinCountPercentage {b in BINS} =
BinCount[b] / total_count;
con MaxCon1 {b in BINS}:
MaxBinCountPercentage >= BinCountPercentage[b];
var IsMaxBin {BINS} binary;
con OneMax:
sum {b in BINS} IsMaxBin[b] = 1;
/* if IsMaxBin[b] = 1 then MaxBinCountPercentage <= BinCountPercentage[b] */
con MaxCon2 {b in BINS}:
MaxBinCountPercentage - BinCountPercentage[b] <= 1 - IsMaxBin[b];
solve;
print BinCount BinCountPercentage;
print count AssignedBin;
quit;
[1] | BinCount | BinCountPercentage |
---|---|---|
1 | 14 | 0.70 |
2 | 3 | 0.15 |
3 | 3 | 0.15 |
[1] | count | AssignedBin |
---|---|---|
330 | 4 | 1 |
350 | 2 | 1 |
370 | 1 | 1 |
390 | 3 | 1 |
430 | 1 | 1 |
450 | 1 | 1 |
480 | 1 | 1 |
500 | 1 | 1 |
530 | 3 | 2 |
540 | 1 | 3 |
550 | 2 | 3 |
Implementing above idea in a datastep. Choosing between only three possible solutions, optimal solution is found directly.
data indata;
input ID fee;
datalines;
2 330
4 330
15 330
17 330
1 350
5 350
18 370
8 390
12 390
19 390
20 430
3 450
13 480
7 500
6 530
10 530
11 530
9 540
14 550
16 550
;
proc sql;
create table tmpData as
select fee, count(fee) as count
from inData
group by fee
order by fee;
quit;
data maxBin;
array f {2000} _temporary_;
do i = 1 to nobs;
set tmpData nobs=nobs;
f{i} = count;
end;
do n1 = 1 to nobs until (s1 >= 3); s1 + f{n1}; end;
do n2 = n1+1 to nobs until (s2 >= 3); s2 + f{n2}; end;
do n4 = nobs to 1 by -1 until (s4 >= 3); s4 + f{n4}; end;
do n3 = n4-1 to 1 by -1 until (s3 >= 3); s3 + f{n3}; end;
sMin = min(s1+s2, s1+s4, s3+s4);
if s1+s2 = sMin then do; from = n2+1; to = nobs; end;
else if s1+s4 = sMin then do; from = n1+1; to = n4-1; end;
else do; from = 1; to = n3-1; end;
set tmpData point=from; fromFee = fee;
set tmpData point=to; toFee = fee;
output;
stop;
keep fromFee toFee;
run;
title "Bin with MAX obs";
proc print data=maxBin noobs; run;
can we print all three bins with count instead of just the max bin
I don't understand you want MAXIMIZE max(count)-min(count) ? data indata; infile cards truncover expandtabs; input ID fee; datalines; 2 330 4 330 15 330 17 330 1 350 5 350 18 370 8 390 12 390 19 390 20 430 3 450 13 480 7 500 6 530 10 530 11 530 9 540 14 550 16 550 ; run; proc iml; use indata; read all var {fee} ; close ; call tabulate(levels,freq,fee); n=ncol(freq); start function(x) global(freq,n); x=unique(x); if range(x) > 1 & x[<>]<=n & x[><]>=1 then do; temp=sum(freq[1:x[1]])|| sum(freq[x[1]+1:x[2]-1])|| sum(freq[x[2]:n]); obj=range(temp); end; else obj=-999999; return (obj); finish; encoding=repeat(1//n,1,2); id=gasetup(2,2,123456789); call gasetobj(id,1,"function"); call gasetsel(id,100,1,0.95); call gainit(id,10000,encoding); niter = 1000; /*Change it as big as you can*/ do i = 1 to niter; call garegen(id); call gagetval(value, id); end; call gagetmem(mem, value, id, 1); members=char(levels[1])+'-'+char(levels[mem[1]])// char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])// char(levels[mem[2]])+'-'+char(levels[n]); sum=sum(freq[1:mem[1]])// sum(freq[mem[1]+1:mem[2]-1])// sum(freq[mem[2]:n]); print members[l="Members"], sum[l="Group sum"], value[l = "Max value"],levels,freq; call gaend(id); quit; OUTPUT: Members 330- 530 540- 540 550- 550 Group sum 17 1 2 Max value 16 levels 330 350 370 390 430 450 480 500 530 540 550 freq 4 2 1 3 1 1 1 1 3 1 2
The user wants to maximize max(count). @Ksharp, you have not enforced the lower bound of 3 items per bin.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.