Solved: Re: Proc OPTMODEL: finding optimal pairs - handling infeasibility

J_Grabowski · Posted 08-02-2018 12:39 PM

Hi Everyone,

Im fairly new to optmodel, so this may be newbie question, but i've already spent alot of time on this and would appreciate greatly if you could help me out.

The objective:

I have two pools of clients that i am trying to pair based on some discreet criteria and their spend in a number of common weeks.

The objective is to find pairs that fulfill the criteria and minimize the sum of weekly differences between the pairs.

One client from each pool can be assigned to no more then one client from another pool.

The problem:

The code i wrote works fine as long as we have enough clients from the second pool to find a match for every client from the first pool. When that's not true, either due to the criteria, or simply not enough clients in pool 2 to find a match for every client in the pool1, the problems becomes infeasible.

This is not the behavior i am trying to achieve - what i would like to see in a situtation when its impossible to find a match for all the clients from pool1 is matching the best possible ones(so the ones minimizing the sum of differnces) and leaving out the infeasible ones.

An example:

I have prepared a simplified version of the code im using, that i think clearly showcases the problem:

/*Creating Datasets*/

data weeks;
	do week = 201801 to 201810;
		output;
	end;
run;

/*5 clients in the first pool*/
data clients1;
	do client_id = 1 to 5;
		output;
	end;
run;

/*only 4 clients in the second pool*/
data clients2;
	do client_id = 101 to 104;
		output;
	end;
run;

data sales;
	do client_id = 1 to 5;
		do week = 201801 to 201810;
			flag="clients1";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
	do client_id = 101 to 104;
		do week = 201801 to 201810;
			flag="clients2";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
run;


proc optmodel;
	set weeks;
	read data weeks into weeks = [week];
		
	/*clients 1*/
	set clients1;
	read data clients1 into clients1 = [client_id];
	
	/*clients 2 spends per week */
	num spends1 {clients1, weeks} init 0;
	read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
	
	/*clients2*/
	set clients2;
	read data clients2 into clients2 = [client_id];

	/*clients 2 spends per week */
	num spends2 {clients2, weeks} init 0;
	read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
	
	/*clients 1,2 matrix*/
	set clientpairs = {c1 in clients1, c2 in clients2};
	
	/*calculating the sum of absolute differences for each possible c1,c2 combination*/
	impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
	
	/*finding the combination that minimizes the total sum of differences*/	
	var match {clientpairs} binary;
	min total_diff = sum{<c1,c2> in clientpairs} diff[c1,c2]*match[c1,c2];
	/*each client1 is present once(here is the problem, but removing this contraint causes no pair to match - minimizes differences)*/
	con one_c1_max {c1 in clients1}:
		sum {<(c1),c2> in clientpairs} match[c1,c2] = 1;

	/*each client2 can not be matched to more than 1 client1*/
	con one_c2_max {c2 in clients2}:
		sum {<c1,(c2)> in clientpairs} match[c1,c2] <= 1;
	
	solve;

	create data res(where =(match = 1)) from [client1 client2] = {clients1, clients2} match diff;
	
quit;


proc sql;
select * from res;
quit;

Let me know if everything is clear and how you would approach this problem.

Regards,

Jakub

EDIT: I see that the code doesnt display properly, ive attached the original version.

RobPratt · Posted 08-02-2018 02:11 PM

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;

In that case, you can just omit the VAR, MIN, and CON statements.

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_milpsolver_syntax02.htm&docsetV...

It is safer to use a loose comparison:

create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;

View solution in original post

RobPratt · Posted 08-02-2018 02:11 PM

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;

In that case, you can just omit the VAR, MIN, and CON statements.

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_milpsolver_syntax02.htm&docsetV...

It is safer to use a loose comparison:

create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;

J_Grabowski · Posted 08-03-2018 06:22 AM

Hi Rob,

Thanks for a quick reply!

The first solution indeed should solve it. This however was just an example, so i need to write a code that works in all possible cases. But still, I could check which set, c1 or c2, has more observations and then conditionally define constraints, I suppose, so thanks for the tip!

The second solution, at a first glance seems like a default approach to this kind of problems. I'll dig deeper, thank you.

In terms of efficiency, is there any argument to stick to a specific approach?

Regards,

Jakub

RobPratt · Posted 08-03-2018 10:16 AM

Yes, you can decide which constraint should be <= 1 by comparing the cardinalities of the two sets. In fact, the linear assignment algorithm does this for you automatically, as described in the documentation link I put in my first reply.

The network solver with the linear assignment algorithm is more efficient (and requires less PROC OPTMODEL code) than the MILP solver for this type of problem.

J_Grabowski · Posted 08-03-2018 11:11 AM

Thanks Rob, I know what to do now.

Regards,

Jakub

J_Grabowski · Posted 08-05-2018 12:47 PM

Hi again,

I have one more question, if you don't mind:

So i have been trying to implement the network solver solution you've suggested and i've ran into another problem.

When looking for the client pairs, I would like to only choose from pairs in which the difference of sum of spends from all the weeks is no bigger than 10%.

I can see that there are quite a few pairs like that, however, when im trying to solve it using modified "diff" as links i get an error that the linear assignment problem is infeasible (104 rows are unassigned).

When i tried using subgraph option for links the effect was the same.

Can you help me understand the reason behind it?

Below you can find the new code. I 've modified it slightly, to get more observations and to calculate the total difference.

%macro lap;
%let cl1=200;
%let cl2=700;

/*Creating Datasets*/

data weeks;
	do week = 201801 to 201810;
		output;
	end;
run;

data clients1;
	do client_id = 1 to &cl1;
		output;
	end;
run;

data clients2;
	do client_id = 1001 to %eval(1000+&cl2);
		output;
	end;
run;

data sales;
	do client_id = 1 to &cl1;
		do week = 201801 to 201810;
			flag="clients1";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
	do client_id = 1001 to %eval(1000+&cl2);
		do week = 201801 to 201810;
			flag="clients2";
			spends=1000*ranuni(1000)*ranuni(2000);
			output;
		end;
	end;
run;


proc optmodel;
	set weeks;
	read data weeks into weeks = [week];
		
	/*clients 1*/
	set clients1;
	read data clients1 into clients1 = [client_id];
	

	num spends1 {clients1, weeks} init 0;
	read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
	
	/*clients2*/
	set clients2;
	read data clients2 into clients2 = [client_id];

	/*clients 2 spends per week */
	num spends2 {clients2, weeks} init 0;
	read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
	
	/*clients 1,2 matrix*/
	set clientpairs = {c1 in clients1, c2 in clients2};
	
	/*NEW!! Calculating the total difference between each store	*/
	num total_diff {<c1,c2> in clientpairs} = abs((sum {w in weeks} spends1[c1,w]) - (sum{w in weeks} spends2[c2,w]))/(sum{w in weeks} spends1[c1,w]);
	
	/*calculating the sum of absolute differences for each possible c1,c2 combination*/
	/*NEW!! Limited to pairs within 10% total sales difference*/
	num diff {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
	
	set <num,num> ASSIGNMENTS;
	solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
	create data res from [client1 client2] = ASSIGNMENTS diff;

quit;


proc sql;
select * from res;
quit;

%mend;
%lap

Regards,

Jakub

RobPratt · Posted 08-05-2018 02:39 PM

Here, clients1 is the smaller set, so each c1 in clients1 is supposed to be matched exactly once. But many such clients have no outgoing links, as you can verify by running the following statements:

   num out_degree {c1 in clients1} = card({<(c1),c2> in clientpairs: total_diff[c1,c2] <= 0.1});
   put ({c1 in clients1: out_degree[c1] = 0});

J_Grabowski · Posted 08-05-2018 04:45 PM

Indeed, there a quite a few clients1 with 0 links and these are the ones i wanted to exclude from the start, as they have no potential matches within the 10% difference.

I am not sure how to make the solver try to find matches only for the remaining clients.

RobPratt · Posted 08-05-2018 06:35 PM

The problem is infeasible, even after removing those c1 with no links. It sounds like you really have two objectives:

maximize the number of matches
minimize the cost of making that many matches

The following code calls the network solver twice (once per objective) on a network that includes an additional source node and sink node:

   /* maximize flow out of source */
   set CLIENTS = clients1 union clients2;
   num source = min {c in CLIENTS} c - 1;
   num sink   = max {c in CLIENTS} c + 1;
   set NODES = CLIENTS union {source,sink};
   num supply_lower {i in NODES} init (if i = sink   then -card(CLIENTS) else 0);
   num supply_upper {i in NODES} init (if i = source then  card(CLIENTS) else 0);
   set LINKS = ({source} cross clients1) union clientpairs2 union (clients2 cross {sink});
   num cost {<i,j> in LINKS} init (if i = source then -1 else 0);
   num link_upper {LINKS} = 1;
   solve with network / mcf direction=directed 
      links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper);

   /* minimize cost to send maximum flow from source */
   supply_lower[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
   supply_upper[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
   for {<i,j> in LINKS} cost[i,j] = (if <i,j> in clientpairs2 then diff[i,j] else 0);
   num flow {LINKS};
   solve with network / mcf direction=directed 
      links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper) out=(flow=flow);
   create data res from [client1 client2] = {<i,j> in clientpairs2: flow[i,j] > 0.5} diff;

The maximum number of matches is 55, and the resulting minimum cost is 97847.800403.

J_Grabowski · Posted 08-06-2018 05:53 AM

Thanks, that makes sense.

I was thinking that indeed it might require two objectives.

Thanks again for your help!

Regards,

Jakub

J_Grabowski · Posted 08-06-2018 10:23 AM

Rob, sorry to bother you again, but im getting an error, when trying to run your code:

Also, whats clientpairs2 in your code?

I assume its the client pairs for which the total difference is not bigger than 10%?

Regards,

Jakub

RobPratt · Posted 08-06-2018 10:31 AM

Sorry that I omitted this part of the code:

	set clientpairs2 = {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1};
	num diff {<c1,c2> in clientpairs2} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

It looks like you are using an older release of SAS/OR in which the NODES= option LOWER was called WEIGHT and UPPER was called WEIGHT2. Please try again using those old option names instead.

J_Grabowski · Posted 08-06-2018 10:45 AM

It works now, thanks!

lipury · Posted 08-07-2018 12:21 PM

This formulation uses the constraint solver with the ELEMENT and ALLDIFFERENT predicates to solve the original problem; the newer requirements haven't been addressed here, but it might be worth considering.

J_Grabowski · Posted 08-08-2018 05:41 AM

Great, thanks, i'll have a look!

As im not very experienced with the procedure, its good to see a few approaches.

Regards,

Jakub

Catch up on SAS Innovate 2026