BookmarkSubscribeRSS Feed
Ujjawal
Quartz | Level 8

Hi Experts,

 

I need to minimize distance to customer location from Portfolio Managers' location. I have the following customer table-

 

Customers Postcode Sales
Riya 110051 98
Raj 110007 45
Sam 110032 72
Sompa 110002 31
Predy 110003 48
David 110005 76

 

The following is the table that has Portfolio Manager Information with Territory/Region.

 

PM Position PM_Postcode Territory
A Manager 110007 NorthDelhi
B Analyst 110051 EastDelhi
C Manager 110032 EastDelhi
D Analyst 110003 NorthDelhi

 

There are some constraints - If sale is greater than 50, the customer should be handled by PM who is Manager. Otherwise Analyst. Within each territory, there is a constraint that manager can only handle up to 3 customers and analyst only up to 5 customers. I am sorry for a very short sample data to work on this problem. I hope the problem makes sense. I am very new to PROC OPTMODEL so it would be highly appreciated if you could provide the solution with example. Thanks in anticipation!

20 REPLIES 20
mkeintz
PROC Star

I presume your objective is, for each customer with over 50 sales, minimize distance to a manager, and for all other customers, to minimize distance to either analysit or manager.

 

I presume there are some customers in postal codes without analysts or managers.  In that case you can't solve this problem without having distances between postal code.  Show that data too.

 

Also, do any postalcodes have both a manager and an analyst?  If so, then how do you assign small customers?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
use function ZIPCITYDISTANCE(zip-code-1, zip-code-2)
could get the distance between two zip.

Your question is very tough. I would like to see how Rob could solve it by OR .

Ujjawal
Quartz | Level 8

Thanks for your reply. If i ain't wrong ZIPCITYDISTANCE function works only for US postcodes. Anyway, i have measured the distance between locations by calculating Eucledian distance of the (x y) coordinates. I need to know how to use PROC OPTMODEL to solve this problem. Since it's an urgent request, any help would be highly appreciated.

Ksharp
Super User
" Within each territory, there is a constraint that manager can only handle up to 3 customers and analyst only up to 5 customers."
I don't understand 'territory' . Why ? 'territory' has nothing to do with this constraint ?

Maybe I could use GA to get it, but as Rob mentioned before, OR might be your best choice.



Ujjawal
Quartz | Level 8

" Within each territory, there is a constraint that manager can only handle up to 3 customers and analyst only up to 5 customers." - I mean to say in each territory i need to maintain the ratio such that manager can serve max 3 customers whose sales or turnover >= (x amount). And Analyst can serve max 5 customers whose sales or turnover < (x amount). By territory, i mean 'Region / City'.

Ksharp
Super User
That territory doesn't mean anything, just could say manager can serve max 3 customers,and Analyst can serve max 5 customers ?
Ksharp
Super User

OK. Here is IML code of GA.

I also want see the OR code by @RobPratt ,that might your best shoot.

 

If you have big table GA can't guarantee you to get th optimal solution.

 

 

 

data have1;
infile cards truncover expandtabs;
input Customers	$ Postcode $ sales x y;
cards;
Riya	110051	98 34 78 
Raj	110007	45 65 65
Sam	110032	72 78 34 
Sompa	110002	31 21 42
Predy	110003	48 47 78
David	110005	76 74 32
;
run;
 
data have2;
infile cards truncover expandtabs;
input PM $ Position $ PM_Postcode $ Territory : $20. x y;
cards;
A	Manager	110007	NorthDelhi 23 54
B	Analyst	110051	EastDelhi 34 45
C	Manager	110032	EastDelhi 56 56
D	Analyst	110003	NorthDelhi 43 88
;
run;

data customer;
 set have1;
 length position $ 10;
 if sales gt 50 then position='Manager';
  else position='Analyst';
run;
data pm;
 set have2;
 if position='Manager' then do;
  do i=1 to 3;
   output;
  end;
 end;
 else do;
  do i=1 to 5;
   output;
  end;
 end;
drop i;
run;

proc sort data=customer;by position;run;
proc sort data=pm;by position;run;

proc iml;
use customer nobs nobs1;
read all var{position};
read all var{x y} into xy_cust;
read all var _char_ into all1[c=vnames1];
close;
n=uniqueby(position);
need_analyst=n[2]-1;
need_manager=nobs1-n[2]+1;

use pm nobs nobs2;
read all var{position};
read all var{x y} into xy_pm;
read all var _char_ into all2[c=vnames2];
close;
n=uniqueby(position);
have_analyst=n[2]-1;
have_manager=nobs2-n[2]+1;

if nobs1>nobs2 then abort 'ERROR: too many customers.';

start func(x) 
global(have_analyst,have_manager,need_analyst,need_manager,xy_cust,xy_pm); 
t1=x[1:need_analyst];
t2=x[(need_analyst+1):(need_analyst+need_manager)];
if countunique(t1)^=need_analyst |  max(t1)>have_analyst |
   countunique(t2)^=need_manager |  min(t2)<(have_analyst+1)
 then obj=999999;
else obj=sum(abs((xy_pm[x,]-xy_cust)));
 return (obj);
finish;


n=need_analyst+need_manager;
encoding=repeat(1//have_analyst,1,need_analyst)||
repeat((have_analyst+1)//(have_analyst+have_manager),1,need_manager);


id=gasetup(2,n,12345678);
call gasetobj(id,0,"func");
call gasetcro(id,0.95,2);
call gasetmut(id,0.95,3);
call gasetsel(id,100,1,1);
call gainit(id,10000,encoding); 

niter = 1000;  /*change it as bigger as you can*/
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

print value[l = "Min Distance Value:"],
      all1[l="" c=vnames1] (all2[mem,])[l="" c=vnames2] ;
call gaend(id);
quit;

 

OUTPUT:

Min Distance Value:
196
Customers Postcode position PM Position PM_Postcode Territory
Raj 110007 Analyst D Analyst 110003 NorthDelhi
Sompa 110002 Analyst B Analyst 110051 EastDelhi
Predy 110003 Analyst D Analyst 110003 NorthDelhi
Riya 110051 Manager A Manager 110007 NorthDelhi
Sam 110032 Manager C Manager 110032 EastDelhi
David 110005 Manager C Manager 110032 EastDelhi
Ujjawal
Quartz | Level 8

Thanks @Ksharp for taking out time to write this code. You are very helpful. God bless you! I'll test this code tomorrow and would keep you posted about it.

Ujjawal
Quartz | Level 8
Hi,

Thank you so much for your solution. You are very helpful. I appreciate it very much. May god bless you! Would you mind tweaking the code and create data set containing output of optimization? I was exploring create data statement to output the result but it's throwing an error. Couldn't define Array in create data statement. Thanks in anticipation!
Ksharp
Super User
OK. Here is. Since @Rob post OR code, maybe you should that .



data have1;
infile cards truncover expandtabs;
input Customers	$ Postcode $ sales x y;
cards;
Riya	110051	98 34 78 
Raj	110007	45 65 65
Sam	110032	72 78 34 
Sompa	110002	31 21 42
Predy	110003	48 47 78
David	110005	76 74 32
;
run;
 
data have2;
infile cards truncover expandtabs;
input PM $ Position $ PM_Postcode $ Territory : $20. x y;
cards;
A	Manager	110007	NorthDelhi 23 54
B	Analyst	110051	EastDelhi 34 45
C	Manager	110032	EastDelhi 56 56
D	Analyst	110003	NorthDelhi 43 88
;
run;

data customer;
 set have1;
 length position $ 10;
 if sales gt 50 then position='Manager';
  else position='Analyst';
run;
data pm;
 set have2;
 if position='Manager' then do;
  do i=1 to 3;
   output;
  end;
 end;
 else do;
  do i=1 to 5;
   output;
  end;
 end;
drop i;
run;

proc sort data=customer;by position;run;
proc sort data=pm;by position;run;

proc iml;
use customer nobs nobs1;
read all var{position};
read all var{x y} into xy_cust;
read all var _char_ into all1[c=vnames1];
close;
n=uniqueby(position);
need_analyst=n[2]-1;
need_manager=nobs1-n[2]+1;

use pm nobs nobs2;
read all var{position};
read all var{x y} into xy_pm;
read all var _char_ into all2[c=vnames2];
close;
n=uniqueby(position);
have_analyst=n[2]-1;
have_manager=nobs2-n[2]+1;

if nobs1>nobs2 then abort 'ERROR: too many customers.';

start func(x) 
global(have_analyst,have_manager,need_analyst,need_manager,xy_cust,xy_pm); 
t1=x[1:need_analyst];
t2=x[(need_analyst+1):(need_analyst+need_manager)];
if countunique(t1)^=need_analyst |  max(t1)>have_analyst |
   countunique(t2)^=need_manager |  min(t2)<(have_analyst+1)
 then obj=999999;
else obj=sum(abs((xy_pm[x,]-xy_cust)));
 return (obj);
finish;


n=need_analyst+need_manager;
encoding=repeat(1//have_analyst,1,need_analyst)||
repeat((have_analyst+1)//(have_analyst+have_manager),1,need_manager);


id=gasetup(2,n,12345678);
call gasetobj(id,0,"func");
call gasetcro(id,0.95,2);
call gasetmut(id,0.95,3);
call gasetsel(id,10,1,1);
call gainit(id,10000,encoding); 

niter = 1000;  /*change it as bigger as you can*/
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

want=all1||all2[mem,] ;
vnames1[ncol(vnames1)]='CPosition';
vnames=(vnames1||vnames2);
create want from want[c=vnames] ;
append from want;
close;
print value[l = "Min Distance Value:"];
call gaend(id);
quit;
proc print data=want noobs;run;



RobPratt
SAS Super FREQ

Here is PROC OPTMODEL code, using the same distance metric as @Ksharp:

proc optmodel;
   set <str> CUSTOMERS;
   num sales {CUSTOMERS};
   num x_c {CUSTOMERS};
   num y_c {CUSTOMERS};
   read data have1 into CUSTOMERS=[customers] sales x_c=x y_c=y;

   set <str> PORTFOLIO_MANAGERS;
   str position {PORTFOLIO_MANAGERS};
   num x_p {PORTFOLIO_MANAGERS};
   num y_p {PORTFOLIO_MANAGERS};
   read data have2 into PORTFOLIO_MANAGERS=[pm] position x_p=x y_p=y;

   set CP_PAIRS = 
      {c in CUSTOMERS, p in PORTFOLIO_MANAGERS: 
      (sales[c] > 50 and position[p] = 'Manager') or (sales[c] <= 50 and position[p] = 'Analyst')};
   num dist {<c,p> in CP_PAIRS} = abs(x_c[c] - x_p[p]) + abs(y_c[c] - y_p[p]);

   var Assign {CP_PAIRS} binary;
   min TotalDistance = sum {<c,p> in CP_PAIRS} dist[c,p] * Assign[c,p];
   con OneManagerPerCustomer {c in CUSTOMERS}:
      sum {<(c),p> in CP_PAIRS} Assign[c,p] = 1;
   con Cardinality {p in PORTFOLIO_MANAGERS}:
      sum {<c,(p)> in CP_PAIRS} Assign[c,p] <= (if position[p] = 'Manager' then 3 else 5);

   solve;
   print Assign;
quit;

SAS Output

Solution Summary
Solver MILP
Algorithm Branch and Cut
Objective Function TotalDistance
Solution Status Optimal
Objective Value 196
   
Relative Gap 0
Absolute Gap 0
Primal Infeasibility 0
Bound Infeasibility 0
Integer Infeasibility 0
   
Best Bound 196
Nodes 0
Iterations 0
Presolve Time 0.00
Solution Time 0.00

Assign
  A B C D
David 0   1  
Predy   0   1
Raj   0   1
Riya 1   0  
Sam 0   1  
Sompa   1   0
RobPratt
SAS Super FREQ

I should also point out that your problem decomposes into two completely disjoint problems, according to sales and position.  A simple way to exploit that structure is to use the decomposition algorithm to detect and solve the two subproblems in parallel:

   solve with MILP / decomp=(method=concomp);

 

Ujjawal
Quartz | Level 8

Thanks a ton @RobPratt for your solution. At the same time, i apologize for so late reply. Your solution works like a charm on small dataset. I have learnt a lot from your solution as i'm new to PROC OPTMODEL. In my dataset, i have around 30k customers and approx. 700 PMs (including both 'Analyst' and 'Manager'). It throws an error 'Out of memory during solution resolution'. I understand it creates a cartesian product of variables which is quite high. It's using 4 threads during processing. Is there any workaround to fine tune the performance? I can comprimise the solution for making it run. Would you advise any performance tips? To workaround this issue, i have split data into 4 parts based on their region and checking distance within each region separately and later combining all the datasets. But it's not an optimal solution for some of the customers whose location is in the border areas of the region. They might have a nearest distance in the other region. Thanks in anticipation! Any help would be highly appreciated!

RobPratt
SAS Super FREQ
The first two things to try are:
1. Use the SAS memsize option to increase the amount of memory available.
2. Split your data into two disjoint parts according to sales and position. Because those two problems are independent, such a split loses no optimality.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 1748 views
  • 7 likes
  • 4 in conversation