Super User
Posts: 10,787

# OR Problem

@Rick_SAS post a optimizal problem. but he solve it by SAS/QC.

@RobPratt want give it a try ?

http://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

SAS Super FREQ
Posts: 4,245

## Re: OR Problem

Just FYI, I always alert Rob to any blog posts that involve optimization. If you see a paragraph at the end of one of my posts that says, "and you can also solve this with PROC OPTMODEL...", that information probably came from Rob.

SAS Employee
Posts: 575

## Re: OR Problem

Splitting into groups that have the same sum (or minimize the maximum sum) is straightforward for the MILP solver in PROC OPTMODEL, but if you want the same mean the problem becomes mixed integer nonlinear programming (MINLP), even without considering the standard deviation.  PROC OPTMODEL does not yet provide an MINLP solver, and so far the most natural workarounds don’t look promising.

Super User
Posts: 10,787

## Re: OR Problem

```@RobPratt,
So Could you write some OR code to solve this problem ?

And I also want know whether OR could solve this problem
when the dataset has one million obs

```
SAS Employee
Posts: 575

## Re: OR Problem

Here is PROC OPTMODEL code that uses the MILP solver to minimize the range between maximum and minimum sums and forces the treatments to have equal or nearly equal numbers of units but ignores standard deviation:

``````proc optmodel;
set TREATMENTS;

set UNITS;
num &Var {UNITS};
read data Units(where=(&Var ne .)) into UNITS=[_N_] &Var;

var Assign {UNITS, TREATMENTS} binary;
con AssignOnce {u in UNITS}:
sum {t in TREATMENTS} Assign[u,t] = 1;
impvar TreatmentSum {t in TREATMENTS} = sum {u in UNITS} &Var.[u] * Assign[u,t];
con Cardinality {t in TREATMENTS}:
floor(card(UNITS) / card(TREATMENTS))
<= sum {u in UNITS} Assign[u,t]
<= ceil(card(UNITS) / card(TREATMENTS));
var MinSum;
var MaxSum;
con MinSumCon {t in TREATMENTS}:
MinSum <= TreatmentSum[t];
con MaxSumCon {t in TREATMENTS}:
MaxSum >= TreatmentSum[t];
min Objective = MaxSum - MinSum;

solve with MILP / decomp;
print TreatmentSum;

num Trt {u in UNITS} = round(sum {t in TREATMENTS} t * Assign[u,t].sol);
create data Groups from [unit] &Var Trt;
quit;

proc means data=Groups mean std;
class Trt;
var &Var;
run;

proc sgplot data=Groups;
vbox &Var / category=Trt;
run;
``````

It takes about a minute and yields the following results:

SAS Output

[1] TreatmentSum
1 139169
2 139169
3 139169
4 139168
5 139169

The MEANS Procedure

Analysis Variable : Cholesterol
Trt N Obs Mean Std Dev
1 627 221.9601276 29.7794342
2 627 221.9601276 41.9241290
3 627 221.9601276 42.8058251
4 627 221.9585327 42.2811774
5 627 221.9601276 54.9869575

You can also perform an exact linearization to handle means instead of sums, but the resulting problem is much harder.  See this blog post from yesterday.

Super User
Posts: 10,787

## Re: OR Problem

[ Edited ]

@RobPratt

Here is my GA .and got the similar result with @Rick_SAS

``````data Units;
set Sashelp.Heart(where=(status = "Alive"));
keep Sex AgeAtStart Height Weight Diastolic Systolic MRW Cholesterol;
run;

proc iml;
use Units nobs nobs;
close;

start function(x) global(Cholesterol,nobs,group,mean_std);
if countunique(x)=group then do;
do i=1 to group;
idx=loc(x=i);
temp=Cholesterol[idx];
mean_std[i,1]=mean(temp);
mean_std[i,2]=std(temp);
end;
sse=sum(mean_std[<>,]-mean_std[><,]);
end;
else sse=999999;
return (sse);
finish;

group=5;  /* <--Change it(divide into 5 groups)*/
mean_std=j(group,2,.);

encoding=j(2,nobs,1);
encoding[2,]=group;

id=gasetup(2,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,1);
call gainit(id,1000,encoding);

niter = 100;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

col_mem=t(mem);
create group var {col_mem};
append;
close;

print value[l = "Min Value:"] ;
call gaend(id);
quit;
data want;
merge group Units(keep=Cholesterol);
run;
proc means data=want mean std;
class col_mem;
var Cholesterol;
run;
``````

The MEANS Procedure

Analysis Variable : Cholesterol
COL_MEM N Obs Mean Std Dev
1 630 221.9544715 43.0901775
2 662 221.9627907 43.1227683
3 629 221.9457237 43.0978582
4 648 221.9667722 43.0939030
5 649 221.9685039 43.0944852

Super User
Posts: 10,787

## Re: OR Problem

@RobPratt

I noticed there are some missing value in ,so I fixed it again and looks like better than Rick's result.

``````data Units;
set Sashelp.Heart(where=(status = "Alive" and Cholesterol is not missing));
n+1;
keep n Sex AgeAtStart Height Weight Diastolic Systolic MRW Cholesterol;
run;

proc iml;
use Units nobs nobs;
close;

start function(x) global(Cholesterol,group,mean_std);
xx=shape(x,group,0,.);
do i=1 to group;
idx=setdif(xx[i,],{.});
temp=Cholesterol[idx];
mean_std[i,1]=mean(temp);
mean_std[i,2]=std(temp);
end;
sse=sum(mean_std[<>,]-mean_std[><,]);
return (sse);
finish;

group=5;  /* <--Change it(divide into 5 groups)*/
mean_std=j(group,2,.);

id=gasetup(3,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,1);
call gainit(id,1000);

niter = 100;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

col_mem=shape(mem,group,0,.);

create group from col_mem;
append from col_mem;
close;

print value[l = "Min Value:"] ;
call gaend(id);
quit;

data group;
set group;
group+1;
run;
proc transpose data=group out=member(drop=_: index=(col1));
by group;
var col:;
run;

data want;
merge member(rename=(col1=n) where=(n is not missing)) Units(keep=n Cholesterol);
by n;
run;
proc means data=want mean std;
class group;
var Cholesterol;
run;``````

OUTPUT:

The MEANS Procedure

Analysis Variable : Cholesterol
group N Obs Mean Std Dev
1 627 221.9649123 43.1137738
2 627 221.9665072 43.1220553
3 627 221.9505582 43.0823468
4 627 221.9569378 43.0712286
5 627 221.9601276 43.1105645
Discussion stats
• 6 replies
• 427 views
• 2 likes
• 3 in conversation