Hi Everyone,
Im fairly new to optmodel, so this may be newbie question, but i've already spent alot of time on this and would appreciate greatly if you could help me out.
The objective:
I have two pools of clients that i am trying to pair based on some discreet criteria and their spend in a number of common weeks.
The objective is to find pairs that fulfill the criteria and minimize the sum of weekly differences between the pairs.
One client from each pool can be assigned to no more then one client from another pool.
The problem:
The code i wrote works fine as long as we have enough clients from the second pool to find a match for every client from the first pool. When that's not true, either due to the criteria, or simply not enough clients in pool 2 to find a match for every client in the pool1, the problems becomes infeasible.
This is not the behavior i am trying to achieve - what i would like to see in a situtation when its impossible to find a match for all the clients from pool1 is matching the best possible ones(so the ones minimizing the sum of differnces) and leaving out the infeasible ones.
An example:
I have prepared a simplified version of the code im using, that i think clearly showcases the problem:
/*Creating Datasets*/
data weeks;
do week = 201801 to 201810;
output;
end;
run;
/*5 clients in the first pool*/
data clients1;
do client_id = 1 to 5;
output;
end;
run;
/*only 4 clients in the second pool*/
data clients2;
do client_id = 101 to 104;
output;
end;
run;
data sales;
do client_id = 1 to 5;
do week = 201801 to 201810;
flag="clients1";
spends=1000*ranuni(1000);
output;
end;
end;
do client_id = 101 to 104;
do week = 201801 to 201810;
flag="clients2";
spends=1000*ranuni(1000);
output;
end;
end;
run;
proc optmodel;
set weeks;
read data weeks into weeks = [week];
/*clients 1*/
set clients1;
read data clients1 into clients1 = [client_id];
/*clients 2 spends per week */
num spends1 {clients1, weeks} init 0;
read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
/*clients2*/
set clients2;
read data clients2 into clients2 = [client_id];
/*clients 2 spends per week */
num spends2 {clients2, weeks} init 0;
read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
/*clients 1,2 matrix*/
set clientpairs = {c1 in clients1, c2 in clients2};
/*calculating the sum of absolute differences for each possible c1,c2 combination*/
impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
/*finding the combination that minimizes the total sum of differences*/
var match {clientpairs} binary;
min total_diff = sum{<c1,c2> in clientpairs} diff[c1,c2]*match[c1,c2];
/*each client1 is present once(here is the problem, but removing this contraint causes no pair to match - minimizes differences)*/
con one_c1_max {c1 in clients1}:
sum {<(c1),c2> in clientpairs} match[c1,c2] = 1;
/*each client2 can not be matched to more than 1 client1*/
con one_c2_max {c2 in clients2}:
sum {<c1,(c2)> in clientpairs} match[c1,c2] <= 1;
solve;
create data res(where =(match = 1)) from [client1 client2] = {clients1, clients2} match diff;
quit;
proc sql;
select * from res;
quit;
Let me know if everything is clear and how you would approach this problem.
Regards,
Jakub
EDIT: I see that the code doesnt display properly, ive attached the original version.
One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.
Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
In that case, you can just omit the VAR, MIN, and CON statements.
Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:
/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):
It is safer to use a loose comparison:
create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;
One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.
Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
In that case, you can just omit the VAR, MIN, and CON statements.
Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:
/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):
It is safer to use a loose comparison:
create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;
Hi Rob,
Thanks for a quick reply!
The first solution indeed should solve it. This however was just an example, so i need to write a code that works in all possible cases. But still, I could check which set, c1 or c2, has more observations and then conditionally define constraints, I suppose, so thanks for the tip!
The second solution, at a first glance seems like a default approach to this kind of problems. I'll dig deeper, thank you.
In terms of efficiency, is there any argument to stick to a specific approach?
Regards,
Jakub
Yes, you can decide which constraint should be <= 1 by comparing the cardinalities of the two sets. In fact, the linear assignment algorithm does this for you automatically, as described in the documentation link I put in my first reply.
The network solver with the linear assignment algorithm is more efficient (and requires less PROC OPTMODEL code) than the MILP solver for this type of problem.
Thanks Rob, I know what to do now.
Regards,
Jakub
Hi again,
I have one more question, if you don't mind:
So i have been trying to implement the network solver solution you've suggested and i've ran into another problem.
When looking for the client pairs, I would like to only choose from pairs in which the difference of sum of spends from all the weeks is no bigger than 10%.
I can see that there are quite a few pairs like that, however, when im trying to solve it using modified "diff" as links i get an error that the linear assignment problem is infeasible (104 rows are unassigned).
When i tried using subgraph option for links the effect was the same.
Can you help me understand the reason behind it?
Below you can find the new code. I 've modified it slightly, to get more observations and to calculate the total difference.
%macro lap;
%let cl1=200;
%let cl2=700;
/*Creating Datasets*/
data weeks;
do week = 201801 to 201810;
output;
end;
run;
data clients1;
do client_id = 1 to &cl1;
output;
end;
run;
data clients2;
do client_id = 1001 to %eval(1000+&cl2);
output;
end;
run;
data sales;
do client_id = 1 to &cl1;
do week = 201801 to 201810;
flag="clients1";
spends=1000*ranuni(1000);
output;
end;
end;
do client_id = 1001 to %eval(1000+&cl2);
do week = 201801 to 201810;
flag="clients2";
spends=1000*ranuni(1000)*ranuni(2000);
output;
end;
end;
run;
proc optmodel;
set weeks;
read data weeks into weeks = [week];
/*clients 1*/
set clients1;
read data clients1 into clients1 = [client_id];
num spends1 {clients1, weeks} init 0;
read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
/*clients2*/
set clients2;
read data clients2 into clients2 = [client_id];
/*clients 2 spends per week */
num spends2 {clients2, weeks} init 0;
read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
/*clients 1,2 matrix*/
set clientpairs = {c1 in clients1, c2 in clients2};
/*NEW!! Calculating the total difference between each store */
num total_diff {<c1,c2> in clientpairs} = abs((sum {w in weeks} spends1[c1,w]) - (sum{w in weeks} spends2[c2,w]))/(sum{w in weeks} spends1[c1,w]);
/*calculating the sum of absolute differences for each possible c1,c2 combination*/
/*NEW!! Limited to pairs within 10% total sales difference*/
num diff {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
quit;
proc sql;
select * from res;
quit;
%mend;
%lap
Regards,
Jakub
Here, clients1 is the smaller set, so each c1 in clients1 is supposed to be matched exactly once. But many such clients have no outgoing links, as you can verify by running the following statements:
num out_degree {c1 in clients1} = card({<(c1),c2> in clientpairs: total_diff[c1,c2] <= 0.1});
put ({c1 in clients1: out_degree[c1] = 0});
Indeed, there a quite a few clients1 with 0 links and these are the ones i wanted to exclude from the start, as they have no potential matches within the 10% difference.
I am not sure how to make the solver try to find matches only for the remaining clients.
The problem is infeasible, even after removing those c1 with no links. It sounds like you really have two objectives:
The following code calls the network solver twice (once per objective) on a network that includes an additional source node and sink node:
/* maximize flow out of source */
set CLIENTS = clients1 union clients2;
num source = min {c in CLIENTS} c - 1;
num sink = max {c in CLIENTS} c + 1;
set NODES = CLIENTS union {source,sink};
num supply_lower {i in NODES} init (if i = sink then -card(CLIENTS) else 0);
num supply_upper {i in NODES} init (if i = source then card(CLIENTS) else 0);
set LINKS = ({source} cross clients1) union clientpairs2 union (clients2 cross {sink});
num cost {<i,j> in LINKS} init (if i = source then -1 else 0);
num link_upper {LINKS} = 1;
solve with network / mcf direction=directed
links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper);
/* minimize cost to send maximum flow from source */
supply_lower[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
supply_upper[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
for {<i,j> in LINKS} cost[i,j] = (if <i,j> in clientpairs2 then diff[i,j] else 0);
num flow {LINKS};
solve with network / mcf direction=directed
links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper) out=(flow=flow);
create data res from [client1 client2] = {<i,j> in clientpairs2: flow[i,j] > 0.5} diff;
The maximum number of matches is 55, and the resulting minimum cost is 97847.800403.
Thanks, that makes sense.
I was thinking that indeed it might require two objectives.
Thanks again for your help!
Regards,
Jakub
Rob, sorry to bother you again, but im getting an error, when trying to run your code:
Also, whats clientpairs2 in your code?
I assume its the client pairs for which the total difference is not bigger than 10%?
Regards,
Jakub
Sorry that I omitted this part of the code:
set clientpairs2 = {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1};
num diff {<c1,c2> in clientpairs2} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
It looks like you are using an older release of SAS/OR in which the NODES= option LOWER was called WEIGHT and UPPER was called WEIGHT2. Please try again using those old option names instead.
This formulation uses the constraint solver with the ELEMENT and ALLDIFFERENT predicates to solve the original problem; the newer requirements haven't been addressed here, but it might be worth considering.
Great, thanks, i'll have a look!
As im not very experienced with the procedure, its good to see a few approaches.
Regards,
Jakub
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.