Solved
New Contributor
Posts: 2

# Group Obs with equal size based on varaiable and set number of Obs in each group

I have 2190 Obs and want to divide this into 5 groups of 438 Obs in each group and have roughly the same amount of SIZE in each group. I know the total for SIZE is 6660.73, so roughly 1332.146 in each group of 438 Obs.

I played around with PROC RANK but that didn't seem like the answer. The part I am having trouble with is the 438 obs in each group. I can do some basic IF/THEN and get 5 groups with roughly 1332.146 SIZE in each group but the number of OBS in those 5 groups is obviously not 438.

Obs      ID      MOD           SIZE

1    DBAS01    3390-3       0.3382
2    DV3L0B    3390-3       2.8372
3    DV3L0C    3390-3       2.8232
4    DV3L0D    3390-3       2.3266
5    DV3L0E    3390-3       2.7004
6    DV3L0F    3390-3       1.3652
7    DV3L0G    3390-3       2.7708
8    DV3L0H    3390-3       2.8374
9    DV3L0I    3390-3       2.3264
10    DV3L0J    3390-3       2.8377

...

...

2185    TSON34    3390-3       0.5344
2186    TSON35    3390-3       0.4412
2187    TSON36    3390-3       0.9780
2188    TSON37    3390-3       1.5679
2189    TSON38    3390-3       0.5232
2190    TSON39    3390-3       0.6562

Accepted Solutions
Solution
‎08-14-2017 05:56 PM
Super User
Posts: 10,787

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

[ Edited ]

It is more like a OR problem . better post it in OR forum.

Or would you like Genetic Algorithm to solve this problem ?

But I think SAS/OR is your best choice .

calling @RobPratt

Can you attach a CSV file or post more data ?

I like to use GA to solve this problem.

All Replies
Solution
‎08-14-2017 05:56 PM
Super User
Posts: 10,787

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

[ Edited ]

It is more like a OR problem . better post it in OR forum.

Or would you like Genetic Algorithm to solve this problem ?

But I think SAS/OR is your best choice .

calling @RobPratt

Can you attach a CSV file or post more data ?

I like to use GA to solve this problem.

SAS Employee
Posts: 575

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

See the PROC OPTMODEL code from this related thread.

Super User
Posts: 10,787

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

Here is the blog written by @Rick_SAS you might be interesting.

http://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

Super User
Posts: 10,787

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

Here is a GA example . Enjoy yourself.

``````data Units;
set Sashelp.Heart(where=(status = "Alive"));
n+1;
keep n Cholesterol;
run;

proc iml;
use Units nobs nobs;
close;

start function(x) global(Cholesterol,group,mean_std);
xx=shape(x,group,0,.);
do i=1 to group;
idx=setdif(xx[i,],{.});
temp=Cholesterol[idx];
mean_std[i,1]=mean(temp);
end;
sse=range(mean_std);
return (sse);
finish;

group=5;  /* <--Change it(divide into 5 groups)*/
mean_std=j(group,1,.);

id=gasetup(3,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,1);
call gainit(id,1000);

niter = 100;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

col_mem=shape(mem,group,0,.);

create group from col_mem;
append from col_mem;
close;

print value[l = "Min Value:"] ;
call gaend(id);
quit;

data group;
set group;
group+1;
run;
proc transpose data=group out=member(drop=_: index=(col1));
by group;
var col:;
run;

data want;
merge member(rename=(col1=n) where=(n is not missing)) Units(keep=n Cholesterol);
by n;
run;
proc means data=want mean ;
class group;
var Cholesterol;
run;``````

OUTPUT:

Min Value:
0.0495539

The MEANS Procedure

Analysis Variable : Cholesterol
group N Obs Mean
1 644 221.9617225
2 644 221.9792663
3 644 221.9556962
4 644 221.9297125
5 642 221.9727127
New Contributor
Posts: 2

## Re: Group Obs with equal size based on varaiable and set number of Obs in each group

Thanks to everyone for the suggestions. Since this is SAS running on z/OS, I was not able to do most of the suggestions. The proc optex looked like to most promising but that procedures was not available on z/OS.

I ended up just sorting them by size and assigning a group number from 1 to 5. Repeating that thru all the obs in descending size order. When I sorted them by number and then broke them out into 5 groups, they where roughly the same size.

|            | Size ALLOC | Size FREE  | # OF Items
|            |------------+------------+------------
|   Size CAP |    Sum     |    Sum     |    Sum
----------------------------+------------+------------+------------+------------
Group1                      |     2136.44|     1347.25|      789.18|      439.00
Group2                      |     2117.11|     1321.39|      795.71|      439.00
Group3                      |     2136.44|     1350.09|      786.35|      439.00
Group4                      |     2161.44|     1372.98|      788.47|      439.00
Group5                      |     2022.22|     1269.02|      753.20|      434.00
All                         |    10573.65|     6660.73|     3912.91|     2190.00
--------------------------------------------------------------------------------

Sometimes you just need some suggestions to get you unstuck from your direction and give you some ideas on where to go.

Thanks for all the suggestions.

☑ This topic is solved.