Desktop productivity for business analysts and programmers

How to split dataset with condition

Reply
Occasional Contributor Hue
Occasional Contributor
Posts: 5

How to split dataset with condition

[ Edited ]

Hi all,

I have a dataset, including 1000 clients corresponding 1000 loans differently. I'd like to split 1000 clients to 50 groups with condition is total loans amount and the number of clients of each group equally?

Could you help me solve this calculation by SAS EG?

Cust_IDLoan_amt ($)
A10
B20
C15
D58

 

Thanks alot,

Esteemed Advisor
Posts: 6,699

Re: How to split dataset with condition

%let groups = 50;

proc sql noprint;
select sum(loan_amt) into :sum_loan from have;
run;

data want;
set have;
retain group group_amt;
if _n_ = 1
then do;
  group = 1;
  group_amount = 0;
end;
group_amount + loan_amt;
if group_amount > &sum_loan / &groups
then do;
  group + 1;
  group_amount = 0;
end;
drop group_amt;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor Hue
Occasional Contributor
Posts: 5

Re: How to split dataset with condition

Thank KurtBremser a lot, but I'd like to have the result which must be satisfy 2 conditions as below:
1/ The total client in each group is equally.
2/ The total loan amount of each group is equally.
As your advice, the result is total loan amount of each group will be less than or equal average loan amount.
Grand Advisor
Posts: 9,593

Re: How to split dataset with condition

No. You can't do that. You only can say :
1/ The total client in each group is equally.
2/ The total loan amount of each group is ALMOST equally.

But that will lead you to a SAS/OR problem. Suggest you post it at OR forum  , @Prat is there.



Here is my GA code. Maybe you want OR code if you have SAS/OR .

proc iml;
use sashelp.class;
read all var {age} ;
close ;

start function(x) global(age,group);
if ncol(x) ^= ncol(unique(x)) then obj=9999;
else do;
 y=age[x,];
 obj=range(shape(y,group,0,0)[,+]);
end;
 return (obj);
finish;

start switch_mut(s) global(nswitches,nrow);
if ncol(s) ^= ncol(unique(s)) then s=ranperm(nrow,1);
else do;
  n = ncol(s);
  do i = 1 to nswitches;
   k1 = int(uniform(1234)*n) + 1;
   k2 = int(uniform(1234)*n) + 1;
   if k1^=k2 then do;
    temp=s[k2];
    s[k2] = s[k1];
    s[k1] = temp;
   end;
  end;
end;
finish;

start uniform_cross(child1, child2, parent1, parent2) global(nrow);

child1 = parent1;
child2 = parent2;
do i = 1 to ncol(parent1);
  r = uniform(1234);
  if r<=0.5 then do;
   child1[i] = parent2[i];
   child2[i] = parent1[i];
  end;
end;

if ncol(child1) ^= ncol(unique(child1)) then child1=ranperm(nrow,1);
 else if ncol(child2) ^= ncol(unique(child2)) then child2=ranperm(nrow,1);

finish;



nrow=nrow(age);
group=4;
nswitches = 3;
encoding=j(2,nrow,1);
encoding[2,]=nrow;    

id=gasetup(2,nrow,1234);
call gasetobj(id,0,"function");
call gasetcro(id,0.95,0,"uniform_cross");
call gasetmut(id,0.95,0,"switch_mut");
call gasetsel(id,100,1,1);
call gainit(id,1000,encoding);


niter = 10000;
summary = j(niter,2);
mattrib summary [c = {"Min Value", "Avg Value"} l=""];
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
 summary[i,1] = value[1];
 summary[i,2] = value[:];
end;
call gagetmem(mem, value, id, 1);


Memebers=shape(age[mem,],group,0,0);
print "Members:" Memebers[l=""],
      "Group sum:" (Memebers[,+]),
      "Min Value: " value[l = ""] ;
call gaend(id);
quit;

Occasional Contributor Hue
Occasional Contributor
Posts: 5

Re: How to split dataset with condition

Thanks Ksharp alot,
Actually, it's difficult to me understand your code.
Esteemed Advisor
Posts: 6,699

Re: How to split dataset with condition

[ Edited ]

Well, I overlooked that.

On second thought, the solution could even be easier:

proc sort data=have;
by descending loan_amt;
run;

data want;
set have;
retain group 0 direction 1;
if direction
then do;
  group + 1;
  if group > &groups
  then do;
    group = &groups;
    direction = 0;
  end;
end;
else do;
  group + (-1);
  if group < 1
  then do;
    group = 1;
    direction = 1;
  end;
end;
run;

This should give you an almost even distribution.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor Hue
Occasional Contributor
Posts: 5

Re: How to split dataset with condition

Thank you for your supporting, KurtBremer.

With your way, I see that the number clients of each group will be equally,

However the total loan amount of each group will be not almost equally. You know that, the gap (loan amount) of each loans is perhap a bit or so much, so if we sort by loan amount and use directory, it'll be not optimal.

                

Grand Advisor
Posts: 9,593

Re: How to split dataset with condition

My code is IML code using Genetic Algorithm , which you need SAS/IML or SAS University Edition to run it.
If you have SAS/OR license , Here are some URL you might be interesting in .


https://communities.sas.com/t5/Mathematical-Optimization/Minimization-challenge/m-p/259055#U259055
https://communities.sas.com/t5/Mathematical-Optimization/OR-Challenge/m-p/268967#U268967
https://communities.sas.com/t5/Mathematical-Optimization/OR-Challenge-again/m-p/274861#U274861
Ask a Question
Discussion stats
  • 7 replies
  • 298 views
  • 3 likes
  • 3 in conversation