BookmarkSubscribeRSS Feed
sbokka
Fluorite | Level 6

I am trying to find Maximun count for each bin range, but bin range can move depending on data. Trying to find three bins and minumum three count in each bin. For example, for this sample

 

IDfee
2330
4330
15330
17330
1350
5350
18370
8390
12390
19390
20430
3450
13480
7500
6530
10530
11530
9540
14550
16

550

 

 

 

these are three different iterations and i want the best optimal solution, something like iteration 2.

 

  Binscount%
Iteration 11330-3901050.00%
 2391-530735.00%
 3531-550315.00%
     
     
Iteration 2 Binscount%
 1330-4501260.00%
 2451-530525.00%
 3531-550315.00%
     
     
Iteration 3 Binscount%
 1330-350630.00%
 2351-5301155.00%
 3531-550315.00%

 

Can we do this in auto process? if so what would be the best approach.

 

 

14 REPLIES 14
RobPratt
SAS Super FREQ

I think the following code does what you want:

data indata;
   input ID fee;
   datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;

proc freq data=indata;
   tables fee / out=freqdata;
run;

proc optmodel;
   set ITEMS;
   num count {ITEMS};
   read data freqdata into ITEMS=[fee] count;
   num total_count = sum {i in ITEMS} count[i];
   set BINS = 1..3;

   /* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
   var Assign {ITEMS, BINS} binary;
   /* assign each item to exactly one bin */
   con AssignOnce {i in ITEMS}:
      sum {b in BINS} Assign[i,b] = 1;

   /* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
   impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];

   /* smaller fee implies same or smaller bin */
   con Monotonic {i in ITEMS, j in ITEMS: i < j}:
      AssignedBin[i] <= AssignedBin[j];

   /* minimize the maximum percentage */
   var MinMax >= 0;
   min Objective = MinMax;
   var BinCount {BINS} >= 3;
   con BinCountCon {b in BINS}:
      BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
   impvar BinCountPercentage {b in BINS} = 
      BinCount[b] / total_count;
   con MinMaxCon {b in BINS}:
      MinMax >= BinCountPercentage[b];
   
   solve;
   print BinCount BinCountPercentage;
   print count AssignedBin;
quit;

 

[1] BinCount BinCountPercentage
1 6 0.30
2 7 0.35
3 7 0.35


[1] count AssignedBin
330 4 1
350 2 1
370 1 2
390 3 2
430 1 2
450 1 2
480 1 2
500 1 3
530 3 3
540 1 3
550 2 3

 

 

sbokka
Fluorite | Level 6

thanks for such a quick response.

Ksharp
Super User

Here is GA code.

 

data indata;
infile cards truncover expandtabs;
   input ID fee;
   datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;
run;

proc iml;
use indata;
read all var {fee}  ;
close ;

call tabulate(levels,freq,fee);
n=ncol(freq);

start function(x) global(freq,n);
x=unique(x);
if range(x) > 1 & x[<>]<=n & x[><]>=1 then do;
temp=sum(freq[1:x[1]])||
     sum(freq[x[1]+1:x[2]-1])||
     sum(freq[x[2]:n]);
obj=range(temp);
end;
else obj=9999999;
return (obj);
finish;

encoding=repeat(1//n,1,2);
id=gasetup(2,2,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,100,1,0.95);
call gainit(id,10000,encoding);


niter = 1000;
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

members=char(levels[1])+'-'+char(levels[mem[1]])//
        char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])//
        char(levels[mem[2]])+'-'+char(levels[n]);
sum=sum(freq[1:mem[1]])//
     sum(freq[mem[1]+1:mem[2]-1])//
     sum(freq[mem[2]:n]);

print members[l="Members"],
      sum[l="Group sum"],
      value[l = "Min value"],levels,freq;
call gaend(id);
quit;

Members
330- 350
370- 480
500- 550
Group sum
6
7
7
Min value
1
levels
330 350 370 390 430 450 480 500 530 540 550
freq
4 2 1 3 1 1 1 1 3 1 2
sbokka
Fluorite | Level 6

thanks for the response.

PGStats
Opal | Level 21

Do the solutions reached by @RobPratt and @Ksharp match your needs? I wouldn't have guessed from your choice of Iteration 2 over Iterations 1 and 3.

PG
sbokka
Fluorite | Level 6

i couldnt test them, since RobPratt used Proc optmodel and i do not have SAS/OR package and Ksharp used GA and somehow its giving me IML error and asking me to contact SAS support(this might be due to Version 9.3).

 

And also RobPratt Suggest me to use MILPSOLVE, so i am trying to use it. no luck yet.

 

What i am really looking is, trying to find a fee bucket range which has the highest frequency. so that i can assing the fee range to the client. in the example iteration 2 has bucket 330-450 with 60% .

RobPratt
SAS Super FREQ

In an off-line private message, I understood the objective to be to minimize the maximum percentage.  Do you instead want to maximize the maximum percentage?

sbokka
Fluorite | Level 6

yes i am trying to maximize the maximum percentage. sorry about the confusion

PGStats
Opal | Level 21

Then the problem is much simpler. There are only three solutions to consider:

 

Min-Min-Max
Min-Max-Min
Max-Min-Min

 

where Min is the smallest bin with a size greater than two and Max is whatever remains, apart from the Min's.

PG
RobPratt
SAS Super FREQ

Here is a modification that maximizes the maximum percentage.  The same optimization model works if you change the number of bins or the minimum number of items per bin.

 

proc optmodel;
   set ITEMS;
   num count {ITEMS};
   read data freqdata into ITEMS=[fee] count;
   num total_count = sum {i in ITEMS} count[i];
   set BINS = 1..3;

   /* Assign[i,b] = 1 if item i is assigned to bin b, 0 otherwise */
   var Assign {ITEMS, BINS} binary;
   /* assign each item to exactly one bin */
   con AssignOnce {i in ITEMS}:
      sum {b in BINS} Assign[i,b] = 1;

   /* AssignedBin[i] = b if and only if Assign[i,b] = 1 */
   impvar AssignedBin {i in ITEMS} = sum {b in BINS} b*Assign[i,b];

   /* smaller fee implies same or smaller bin */
   con Monotonic {i in ITEMS, j in ITEMS: i < j}:
      AssignedBin[i] <= AssignedBin[j];

   /* maximize the maximum percentage */
   var MaxBinCountPercentage >= 0;
   max Objective = MaxBinCountPercentage;
   var BinCount {BINS} >= 3;
   con BinCountCon {b in BINS}:
      BinCount[b] = sum {i in ITEMS} count[i]*Assign[i,b];
   impvar BinCountPercentage {b in BINS} = 
      BinCount[b] / total_count;
   con MaxCon1 {b in BINS}:
      MaxBinCountPercentage >= BinCountPercentage[b];
   var IsMaxBin {BINS} binary;
   con OneMax:
      sum {b in BINS} IsMaxBin[b] = 1;
   /* if IsMaxBin[b] = 1 then MaxBinCountPercentage <= BinCountPercentage[b] */
   con MaxCon2 {b in BINS}:
      MaxBinCountPercentage - BinCountPercentage[b] <= 1 - IsMaxBin[b];
   
   solve;
   print BinCount BinCountPercentage;
   print count AssignedBin;
quit;


[1] BinCount BinCountPercentage
1 14 0.70
2 3 0.15
3 3 0.15


[1] count AssignedBin
330 4 1
350 2 1
370 1 1
390 3 1
430 1 1
450 1 1
480 1 1
500 1 1
530 3 2
540 1 3
550 2 3
PGStats
Opal | Level 21

Implementing above idea in a datastep. Choosing between only three possible solutions, optimal solution is found directly.

 

data indata;
   input ID fee;
   datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;

proc sql;
create table tmpData as
select fee, count(fee) as count
from inData
group by fee
order by fee;
quit;

data maxBin;
array f {2000} _temporary_;
do i = 1 to nobs;
    set tmpData nobs=nobs;
    f{i} = count;
    end;
do n1 = 1 to nobs until (s1 >= 3); s1 + f{n1}; end;
do n2 = n1+1 to nobs until (s2 >= 3); s2 + f{n2}; end;
do n4 = nobs to 1 by -1 until (s4 >= 3); s4 + f{n4}; end;
do n3 = n4-1 to 1 by -1 until (s3 >= 3); s3 + f{n3}; end;
sMin = min(s1+s2, s1+s4, s3+s4);
if s1+s2 = sMin then do; from = n2+1; to = nobs; end;
else if s1+s4 = sMin then do; from = n1+1; to = n4-1; end;
else do; from = 1; to = n3-1; end;
set tmpData point=from; fromFee = fee;
set tmpData point=to; toFee = fee;
output;
stop;
keep fromFee toFee;
run;

title "Bin with MAX obs";
proc print data=maxBin noobs; run;
PG
sbokka
Fluorite | Level 6

can we print all three bins with count instead of just the max bin

Ksharp
Super User
I don't understand you want MAXIMIZE  max(count)-min(count) ?


data indata;
infile cards truncover expandtabs;
   input ID fee;
   datalines;
2	330
4	330
15	330
17	330
1	350
5	350
18	370
8	390
12	390
19	390
20	430
3	450
13	480
7	500
6	530
10	530
11	530
9	540
14	550
16	550
;
run;

proc iml;
use indata;
read all var {fee}  ;
close ;

call tabulate(levels,freq,fee);
n=ncol(freq);

start function(x) global(freq,n);
x=unique(x);
if range(x) > 1 & x[<>]<=n & x[><]>=1 then do;
temp=sum(freq[1:x[1]])||
     sum(freq[x[1]+1:x[2]-1])||
     sum(freq[x[2]:n]);
obj=range(temp);
end;
else obj=-999999;
return (obj);
finish;

encoding=repeat(1//n,1,2);
id=gasetup(2,2,123456789);
call gasetobj(id,1,"function");
call gasetsel(id,100,1,0.95);
call gainit(id,10000,encoding);


niter = 1000;  /*Change it as big as you can*/
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

members=char(levels[1])+'-'+char(levels[mem[1]])//
        char(levels[mem[1]+1])+'-'+char(levels[mem[2]-1])//
        char(levels[mem[2]])+'-'+char(levels[n]);
sum=sum(freq[1:mem[1]])//
     sum(freq[mem[1]+1:mem[2]-1])//
     sum(freq[mem[2]:n]);

print members[l="Members"],
      sum[l="Group sum"],
      value[l = "Max value"],levels,freq;
call gaend(id);
quit;



OUTPUT:

Members
330- 530
540- 540
550- 550
Group sum
17
1
2
Max value
16
levels
330	350	370	390	430	450	480	500	530	540	550
freq
4	2	1	3	1	1	1	1	3	1	2

RobPratt
SAS Super FREQ

The user wants to maximize max(count).  @Ksharp, you have not enforced the lower bound of 3 items per bin.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1982 views
  • 10 likes
  • 4 in conversation