BookmarkSubscribeRSS Feed
Ksharp
Super User

@Rick_SAS post a optimizal problem. but he solve it by SAS/QC.

@RobPratt want give it a try ?

 

 

http://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

6 REPLIES 6
Rick_SAS
SAS Super FREQ

Just FYI, I always alert Rob to any blog posts that involve optimization. If you see a paragraph at the end of one of my posts that says, "and you can also solve this with PROC OPTMODEL...", that information probably came from Rob. 

RobPratt
SAS Super FREQ

Splitting into groups that have the same sum (or minimize the maximum sum) is straightforward for the MILP solver in PROC OPTMODEL, but if you want the same mean the problem becomes mixed integer nonlinear programming (MINLP), even without considering the standard deviation.  PROC OPTMODEL does not yet provide an MINLP solver, and so far the most natural workarounds don’t look promising.

Ksharp
Super User
@RobPratt,
So Could you write some OR code to solve this problem ?

And I also want know whether OR could solve this problem 
when the dataset has one million obs 


RobPratt
SAS Super FREQ

Here is PROC OPTMODEL code that uses the MILP solver to minimize the range between maximum and minimum sums and forces the treatments to have equal or nearly equal numbers of units but ignores standard deviation:

 

 

proc optmodel;
   set TREATMENTS;
   read data Treatments into TREATMENTS=[Trt];

   set UNITS;
   num &Var {UNITS};
   read data Units(where=(&Var ne .)) into UNITS=[_N_] &Var;

   var Assign {UNITS, TREATMENTS} binary;
   con AssignOnce {u in UNITS}:
      sum {t in TREATMENTS} Assign[u,t] = 1;
   impvar TreatmentSum {t in TREATMENTS} = sum {u in UNITS} &Var.[u] * Assign[u,t];
   con Cardinality {t in TREATMENTS}:
      floor(card(UNITS) / card(TREATMENTS)) 
   <= sum {u in UNITS} Assign[u,t] 
   <= ceil(card(UNITS) / card(TREATMENTS));
   var MinSum;
   var MaxSum;
   con MinSumCon {t in TREATMENTS}:
      MinSum <= TreatmentSum[t];
   con MaxSumCon {t in TREATMENTS}:
      MaxSum >= TreatmentSum[t];
   min Objective = MaxSum - MinSum;

   solve with MILP / decomp;
   print TreatmentSum;

   num Trt {u in UNITS} = round(sum {t in TREATMENTS} t * Assign[u,t].sol);
   create data Groups from [unit] &Var Trt;
quit;

proc means data=Groups mean std;
  class Trt;
  var &Var;
run;

proc sgplot data=Groups;
   vbox &Var / category=Trt;
run;

 

It takes about a minute and yields the following results:

SAS Output

[1] TreatmentSum
1 139169
2 139169
3 139169
4 139168
5 139169

 

The MEANS Procedure

 

Analysis Variable : Cholesterol
Trt N Obs Mean Std Dev
1 627 221.9601276 29.7794342
2 627 221.9601276 41.9241290
3 627 221.9601276 42.8058251
4 627 221.9585327 42.2811774
5 627 221.9601276 54.9869575

 

You can also perform an exact linearization to handle means instead of sums, but the resulting problem is much harder.  See this blog post from yesterday.

Ksharp
Super User

@RobPratt

Here is my GA .and got the similar result with @Rick_SAS

 

 

 

data Units;            
set Sashelp.Heart(where=(status = "Alive"));
keep Sex AgeAtStart Height Weight Diastolic Systolic MRW Cholesterol;
run;


proc iml;
use Units nobs nobs;
read all var {Cholesterol};
close;

start function(x) global(Cholesterol,nobs,group,mean_std);
if countunique(x)=group then do;
do i=1 to group;
 idx=loc(x=i);
 temp=Cholesterol[idx]; 
 mean_std[i,1]=mean(temp);
 mean_std[i,2]=std(temp);
end;
sse=sum(mean_std[<>,]-mean_std[><,]);
end;
else sse=999999;
return (sse);
finish;

group=5;  /* <--Change it(divide into 5 groups)*/
mean_std=j(group,2,.);

encoding=j(2,nobs,1);
encoding[2,]=group;    

id=gasetup(2,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,1);
call gainit(id,1000,encoding);


niter = 100;
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

col_mem=t(mem);
create group var {col_mem};
append;
close;

print value[l = "Min Value:"] ;
call gaend(id);
quit;
data want;
 merge group Units(keep=Cholesterol);
run;
proc means data=want mean std;
  class col_mem;
  var Cholesterol;
run;

 

The MEANS Procedure

Analysis Variable : Cholesterol
COL_MEM N Obs Mean Std Dev
1 630 221.9544715 43.0901775
2 662 221.9627907 43.1227683
3 629 221.9457237 43.0978582
4 648 221.9667722 43.0939030
5 649 221.9685039 43.0944852

 

Ksharp
Super User

@RobPratt

I noticed there are some missing value in ,so I fixed it again and looks like better than Rick's result.

 

 

data Units;            
set Sashelp.Heart(where=(status = "Alive" and Cholesterol is not missing));
n+1;
keep n Sex AgeAtStart Height Weight Diastolic Systolic MRW Cholesterol;
run;
 
proc iml;
use Units nobs nobs;
read all var {Cholesterol};
close;
 
start function(x) global(Cholesterol,group,mean_std);
xx=shape(x,group,0,.);
do i=1 to group;
 idx=setdif(xx[i,],{.});
 temp=Cholesterol[idx]; 
 mean_std[i,1]=mean(temp);
 mean_std[i,2]=std(temp);
end;
sse=sum(mean_std[<>,]-mean_std[><,]);
return (sse);
finish;
 
group=5;  /* <--Change it(divide into 5 groups)*/
mean_std=j(group,2,.);
  
id=gasetup(3,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,1);
call gainit(id,1000);
 
niter = 100;
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);
 
col_mem=shape(mem,group,0,.);

create group from col_mem;
append from col_mem;
close;
 
print value[l = "Min Value:"] ;
call gaend(id);
quit;

data group;
 set group;
 group+1;
run;
proc transpose data=group out=member(drop=_: index=(col1));
by group;
var col:;
run;
 
data want;
 merge member(rename=(col1=n) where=(n is not missing)) Units(keep=n Cholesterol);
 by n;
run;
proc means data=want mean std;
  class group;
  var Cholesterol;
run;

 

 

OUTPUT:

The MEANS Procedure

Analysis Variable : Cholesterol
group N Obs Mean Std Dev
1 627 221.9649123 43.1137738
2 627 221.9665072 43.1220553
3 627 221.9505582 43.0823468
4 627 221.9569378 43.0712286
5 627 221.9601276 43.1105645

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1838 views
  • 2 likes
  • 3 in conversation