Here is the question coming from :
https://communities.sas.com/t5/Base-SAS-Programming/Combining-rows-of-data/m-p/274668
The question contain three sub-questions. the last one is what I am talking about. This question is very like
https://communities.sas.com/t5/General-SAS-Programming/Creating-Groups-of-Equal-Size/td-p/268862
@RobPratt gave the OR code, but the different thing is this question's each group have different number of members .
@RobPratt want give it a try ?
Due to the original post is too long and cumbersome. I quote here :
"
The the exact criterion for "as equally as possible," is this: we want each week to have as close to the average number of tests for each Score. I've attached a file to show what I'm talking about. For example, the TotalScores across all Table/Player combinations for ScoreA, ScoreB, ScoreC and ScoreD are 67, 134, 201 and 268 respectively. The Average Scores are 17, 34, 50 and 67. In lines 13-16 of the attached file, you see each week and the scores assigned. Weeks 1 - 3 are pretty close to the actual averages. Week 4 however, is lower but I think that has to do with the small data set.
Does this make sense? All the scores need to be distributed "as equally as possible" not the total score. "
TableList | PlayerList | ScoreA | ScoreB | ScoreC | ScoreD | Total Samples | |
A_C_P_AC | 12 | 4 | 8 | 12 | 16 | 40 | Week2 |
R_AA_AB | 17 | 3 | 6 | 9 | 12 | 30 | Week2 |
B_F_G_K_L_O | 3,4,11 | 18 | 36 | 54 | 72 | 180 | Week1 |
S_T_U_V_AD | 5,6 | 10 | 20 | 30 | 40 | 100 | Week2 |
D_E_H_I_J_M_N | 1,2,14 | 21 | 42 | 63 | 84 | 210 | Week3 |
Y,Z | 13,16 | 4 | 8 | 12 | 16 | 40 | Week4 |
W,X | 7,15 | 4 | 8 | 12 | 16 | 40 | Week4 |
Q | 8,9,10 | 3 | 6 | 9 | 12 | 30 | Week4 |
Total Score | 67 | 134 | 201 | 268 | 670 | ||
Average Score | 17 | 34 | 50 | 67 | 168 | ||
Week1 | 18 | 36 | 54 | 72 | |||
Week2 | 17 | 34 | 51 | 68 | |||
Week3 | 21 | 42 | 63 | 84 | |||
Week4 | 11 | 22 | 33 | 44 |
Here is my IML code:
data have;
infile cards expandtabs truncover;
input TableList : $20. PlayerList : $20. ScoreA ScoreB ScoreC ScoreD;
cards;
A_C_P_AC 12 4 8 12 16
R_AA_AB 17 3 6 9 12
B_F_G_K_L_O 3,4,11 18 36 54 72
S_T_U_V_AD 5,6 10 20 30 40
D_E_H_I_J_M_N 1,2,14 21 42 63 84
Y,Z 13,16 4 8 12 16
W,X 7,15 4 8 12 16
Q 8,9,10 3 6 9 12
;
run;
proc iml;
use have;
read all var _num_ into age[c=vname] ;
close ;
start function(x) global(age,group,nrow,ncol);
sum_group=j(group,ncol,0);
do i=1 to group;
do j=1 to nrow;
if x[j]=i then sum_group[i,]=sum_group[i,]+age[j,];
end;
end;
obj=sum(sum_group[<>,]-sum_group[><,]);
return (obj);
finish;
ncol=ncol(age);
nrow=nrow(age);
group=4; /* <--Change it(divide into 4 groups)*/
encoding=j(2,nrow,1);
encoding[2,]=group;
id=gasetup(2,nrow,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,100,1,1);
call gainit(id,1000,encoding);
niter = 10000;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);
groups=t(mem);
create group var {groups};
append;
close;
print value[l = "Min Value:"] ;
call gaend(id);
quit;
data want;
merge group have;
run;
proc summary data=want nway ;
class groups;
var _numeric_;
output out=sum(drop=_:) sum=;
run;
proc print noobs;run;
proc print data=want noobs;run;
The following PROC OPTMODEL code confirms that 70 is the minimum sum of ranges:
%let num_groups = 4;
proc optmodel;
set VARS = {'A','B','C','D'};
set GROUPS = 1..&num_groups;
set OBS;
str TableList {OBS};
str PlayerList {OBS};
num a {OBS, VARS};
read data have into OBS=[_N_] TableList PlayerList {j in VARS} <a[_N_,j]=col('Score'||j)>;
/* Assign[i,g] = 1 if observation i assigned to group g, 0 otherwise */
var Assign {OBS, GROUPS} binary;
con AssignOnce {i in OBS}:
sum {g in GROUPS} Assign[i,g] = 1;
impvar GroupSum {g in GROUPS, j in VARS} = sum {i in OBS} a[i,j] * Assign[i,g];
var MinSum {VARS}, MaxSum {VARS};
con MinSumCon {g in GROUPS, j in VARS}:
MinSum[j] <= GroupSum[g,j];
con MaxSumCon {g in GROUPS, j in VARS}:
MaxSum[j] >= GroupSum[g,j];
impvar Range {j in VARS} = MaxSum[j] - MinSum[j];
min Objective = sum {j in VARS} Range[j];
solve;
print Assign;
print GroupSum;
num groupID {OBS};
for {i in OBS} do;
for {g in GROUPS: Assign[i,g].sol > 0.5} do;
groupID[i] = g;
leave;
end;
end;
create data want(drop=i) from [i] TableList PlayerList {j in VARS} <col('Score'||j)=a[i,j]> groupID;
quit;
SAS Output
The small instance solved instantly, but the performance on a larger instance is difficult to predict. If you point me to the data, I'll give it a try.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.