## How to split dataset with condition

Occasional Contributor
Posts: 5

# How to split dataset with condition

[ Edited ]

Hi all,

I have a dataset, including 1000 clients corresponding 1000 loans differently. I'd like to split 1000 clients to 50 groups with condition is total loans amount and the number of clients of each group equally?

Could you help me solve this calculation by SAS EG?

 Cust_ID Loan_amt (\$) A 10 B 20 C 15 D 58

Thanks alot,

Super User
Posts: 10,530

## Re: How to split dataset with condition

``````%let groups = 50;

proc sql noprint;
select sum(loan_amt) into :sum_loan from have;
run;

data want;
set have;
retain group group_amt;
if _n_ = 1
then do;
group = 1;
group_amount = 0;
end;
group_amount + loan_amt;
if group_amount > &sum_loan / &groups
then do;
group + 1;
group_amount = 0;
end;
drop group_amt;
run;
``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 5

## Re: How to split dataset with condition

Thank KurtBremser a lot, but I'd like to have the result which must be satisfy 2 conditions as below:
1/ The total client in each group is equally.
2/ The total loan amount of each group is equally.
As your advice, the result is total loan amount of each group will be less than or equal average loan amount.
Super User
Posts: 10,846

## Re: How to split dataset with condition

```No. You can't do that. You only can say :
1/ The total client in each group is equally.
2/ The total loan amount of each group is ALMOST equally.

But that will lead you to a SAS/OR problem. Suggest you post it at OR forum  , @Prat is there.

Here is my GA code. Maybe you want OR code if you have SAS/OR .

proc iml;
use sashelp.class;
close ;

start function(x) global(age,group);
if ncol(x) ^= ncol(unique(x)) then obj=9999;
else do;
y=age[x,];
obj=range(shape(y,group,0,0)[,+]);
end;
return (obj);
finish;

start switch_mut(s) global(nswitches,nrow);
if ncol(s) ^= ncol(unique(s)) then s=ranperm(nrow,1);
else do;
n = ncol(s);
do i = 1 to nswitches;
k1 = int(uniform(1234)*n) + 1;
k2 = int(uniform(1234)*n) + 1;
if k1^=k2 then do;
temp=s[k2];
s[k2] = s[k1];
s[k1] = temp;
end;
end;
end;
finish;

start uniform_cross(child1, child2, parent1, parent2) global(nrow);

child1 = parent1;
child2 = parent2;
do i = 1 to ncol(parent1);
r = uniform(1234);
if r<=0.5 then do;
child1[i] = parent2[i];
child2[i] = parent1[i];
end;
end;

if ncol(child1) ^= ncol(unique(child1)) then child1=ranperm(nrow,1);
else if ncol(child2) ^= ncol(unique(child2)) then child2=ranperm(nrow,1);

finish;

nrow=nrow(age);
group=4;
nswitches = 3;
encoding=j(2,nrow,1);
encoding[2,]=nrow;

id=gasetup(2,nrow,1234);
call gasetobj(id,0,"function");
call gasetcro(id,0.95,0,"uniform_cross");
call gasetmut(id,0.95,0,"switch_mut");
call gasetsel(id,100,1,1);
call gainit(id,1000,encoding);

niter = 10000;
summary = j(niter,2);
mattrib summary [c = {"Min Value", "Avg Value"} l=""];
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
summary[i,1] = value[1];
summary[i,2] = value[:];
end;
call gagetmem(mem, value, id, 1);

Memebers=shape(age[mem,],group,0,0);
print "Members:" Memebers[l=""],
"Group sum:" (Memebers[,+]),
"Min Value: " value[l = ""] ;
call gaend(id);
quit;

```
Occasional Contributor
Posts: 5

## Re: How to split dataset with condition

Thanks Ksharp alot,
Actually, it's difficult to me understand your code.
Super User
Posts: 10,530

## Re: How to split dataset with condition

[ Edited ]

Well, I overlooked that.

On second thought, the solution could even be easier:

``````proc sort data=have;
by descending loan_amt;
run;

data want;
set have;
retain group 0 direction 1;
if direction
then do;
group + 1;
if group > &groups
then do;
group = &groups;
direction = 0;
end;
end;
else do;
group + (-1);
if group < 1
then do;
group = 1;
direction = 1;
end;
end;
run;``````

This should give you an almost even distribution.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 5

## Re: How to split dataset with condition

Thank you for your supporting, KurtBremer.

With your way, I see that the number clients of each group will be equally,

However the total loan amount of each group will be not almost equally. You know that, the gap (loan amount) of each loans is perhap a bit or so much, so if we sort by loan amount and use directory, it'll be not optimal.

Super User
Posts: 10,846

## Re: How to split dataset with condition

```My code is IML code using Genetic Algorithm , which you need SAS/IML or SAS University Edition to run it.
If you have SAS/OR license , Here are some URL you might be interesting in .

https://communities.sas.com/t5/Mathematical-Optimization/Minimization-challenge/m-p/259055#U259055
https://communities.sas.com/t5/Mathematical-Optimization/OR-Challenge/m-p/268967#U268967
https://communities.sas.com/t5/Mathematical-Optimization/OR-Challenge-again/m-p/274861#U274861
```
Discussion stats
• 7 replies
• 413 views
• 3 likes
• 3 in conversation