BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
J_Grabowski
Calcite | Level 5

Hi Everyone,

Im fairly new to optmodel, so this may be newbie question, but i've already spent alot of time on this and would appreciate greatly if you could help me out.

 

The objective:

I have two pools of clients that i am trying to pair based on some discreet criteria and their spend in a number of common weeks.

The objective is to find pairs that fulfill the criteria and minimize the sum of weekly differences between the pairs.

One client from each pool can  be assigned to no more then one client from another pool.

 

The problem:

The code i wrote works fine as long as we have enough clients from the second pool to find a match for every client from the first pool. When that's not true, either due to the criteria, or simply not enough clients in pool 2 to find a match for every client in the pool1, the problems becomes infeasible.

This is not the behavior i am trying to achieve - what i would like to see in a situtation when its impossible to find a match for all the clients from pool1 is matching the best possible ones(so the ones minimizing the sum of differnces) and leaving out the infeasible ones.

 

An example:

I have prepared a simplified version of the code im using, that i think clearly showcases the problem:

 

/*Creating Datasets*/

data weeks;
	do week = 201801 to 201810;
		output;
	end;
run;

/*5 clients in the first pool*/
data clients1;
	do client_id = 1 to 5;
		output;
	end;
run;

/*only 4 clients in the second pool*/
data clients2;
	do client_id = 101 to 104;
		output;
	end;
run;

data sales;
	do client_id = 1 to 5;
		do week = 201801 to 201810;
			flag="clients1";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
	do client_id = 101 to 104;
		do week = 201801 to 201810;
			flag="clients2";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
run;


proc optmodel;
	set weeks;
	read data weeks into weeks = [week];
		
	/*clients 1*/
	set clients1;
	read data clients1 into clients1 = [client_id];
	
	/*clients 2 spends per week */
	num spends1 {clients1, weeks} init 0;
	read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
	
	/*clients2*/
	set clients2;
	read data clients2 into clients2 = [client_id];

	/*clients 2 spends per week */
	num spends2 {clients2, weeks} init 0;
	read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
	
	/*clients 1,2 matrix*/
	set clientpairs = {c1 in clients1, c2 in clients2};
	
	/*calculating the sum of absolute differences for each possible c1,c2 combination*/
	impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
	
	/*finding the combination that minimizes the total sum of differences*/	
	var match {clientpairs} binary;
	min total_diff = sum{<c1,c2> in clientpairs} diff[c1,c2]*match[c1,c2];
	/*each client1 is present once(here is the problem, but removing this contraint causes no pair to match - minimizes differences)*/
	con one_c1_max {c1 in clients1}:
		sum {<(c1),c2> in clientpairs} match[c1,c2] = 1;

	/*each client2 can not be matched to more than 1 client1*/
	con one_c2_max {c2 in clients2}:
		sum {<c1,(c2)> in clientpairs} match[c1,c2] <= 1;
	
	solve;

	create data res(where =(match = 1)) from [client1 client2] = {clients1, clients2} match diff;
	
quit;


proc sql;
select * from res;
quit;

 

 

 Let me know if everything is clear and how you would approach this problem. 

 

Regards,

Jakub

 

EDIT: I see that the code doesnt display properly, ive attached the original version.

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once.  The resulting optimal objective value is then 10197.831456.

 

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;

In that case, you can just omit the VAR, MIN, and CON statements.

 

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_milpsolver_syntax02.htm&docsetV...

 

It is safer to use a loose comparison:

create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;

View solution in original post

14 REPLIES 14
RobPratt
SAS Super FREQ

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once.  The resulting optimal objective value is then 10197.831456.

 

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;

In that case, you can just omit the VAR, MIN, and CON statements.

 

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_milpsolver_syntax02.htm&docsetV...

 

It is safer to use a loose comparison:

create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;
J_Grabowski
Calcite | Level 5

Hi Rob,

Thanks for a quick reply!

 

The first solution indeed should solve it. This however was just an example, so i need to write a code that works in all possible cases. But still, I could check which set, c1 or c2, has more observations and then conditionally define constraints, I suppose, so thanks for the tip!

 

The second solution, at a first glance seems like a default approach to this kind of problems. I'll dig deeper, thank you.

In terms of efficiency, is there any argument to stick to a specific approach?

 

Regards,

Jakub

RobPratt
SAS Super FREQ

Yes, you can decide which constraint should be <= 1 by comparing the cardinalities of the two sets.  In fact, the linear assignment algorithm does this for you automatically, as described in the documentation link I put in my first reply.

 

The network solver with the linear assignment algorithm is more efficient (and requires less PROC OPTMODEL code) than the MILP solver for this type of problem.

J_Grabowski
Calcite | Level 5

Thanks Rob, I know what to do now.

 

Regards,

Jakub

J_Grabowski
Calcite | Level 5

Hi again,

 

I have one more question, if you don't mind:

So i have been trying to implement the network solver solution you've suggested and i've ran into another problem.

When looking for the client pairs, I would like to only choose from pairs in which the difference of sum of spends from all the weeks is no bigger than 10%.

I can see that there are quite a few pairs like that, however, when im trying to solve it using modified "diff" as links i get an error that the linear assignment problem is infeasible (104 rows are unassigned).

When i tried using subgraph option for links the effect was the same.

Can you help me understand the reason behind it?

 

Below you can find the new code. I 've modified it slightly, to get more observations and to calculate the total difference.

%macro lap;
%let cl1=200;
%let cl2=700;

/*Creating Datasets*/

data weeks;
	do week = 201801 to 201810;
		output;
	end;
run;

data clients1;
	do client_id = 1 to &cl1;
		output;
	end;
run;

data clients2;
	do client_id = 1001 to %eval(1000+&cl2);
		output;
	end;
run;

data sales;
	do client_id = 1 to &cl1;
		do week = 201801 to 201810;
			flag="clients1";
			spends=1000*ranuni(1000);
			output;
		end;
	end;
	do client_id = 1001 to %eval(1000+&cl2);
		do week = 201801 to 201810;
			flag="clients2";
			spends=1000*ranuni(1000)*ranuni(2000);
			output;
		end;
	end;
run;


proc optmodel;
	set weeks;
	read data weeks into weeks = [week];
		
	/*clients 1*/
	set clients1;
	read data clients1 into clients1 = [client_id];
	

	num spends1 {clients1, weeks} init 0;
	read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
	
	/*clients2*/
	set clients2;
	read data clients2 into clients2 = [client_id];

	/*clients 2 spends per week */
	num spends2 {clients2, weeks} init 0;
	read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
	
	/*clients 1,2 matrix*/
	set clientpairs = {c1 in clients1, c2 in clients2};
	
	/*NEW!! Calculating the total difference between each store	*/
	num total_diff {<c1,c2> in clientpairs} = abs((sum {w in weeks} spends1[c1,w]) - (sum{w in weeks} spends2[c2,w]))/(sum{w in weeks} spends1[c1,w]);
	
	/*calculating the sum of absolute differences for each possible c1,c2 combination*/
	/*NEW!! Limited to pairs within 10% total sales difference*/
	num diff {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
	
	set <num,num> ASSIGNMENTS;
	solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
	create data res from [client1 client2] = ASSIGNMENTS diff;

quit;


proc sql;
select * from res;
quit;

%mend;
%lap

Regards,

Jakub

RobPratt
SAS Super FREQ

Here, clients1 is the smaller set, so each c1 in clients1 is supposed to be matched exactly once.  But many such clients have no outgoing links, as you can verify by running the following statements:

   num out_degree {c1 in clients1} = card({<(c1),c2> in clientpairs: total_diff[c1,c2] <= 0.1});
   put ({c1 in clients1: out_degree[c1] = 0});
J_Grabowski
Calcite | Level 5

Indeed, there a quite a few clients1 with 0 links and these are the ones i wanted to exclude from the start, as they have no potential matches within the 10% difference.

I am not sure how to make the solver try to find matches only for the remaining clients.

 

RobPratt
SAS Super FREQ

The problem is infeasible, even after removing those c1 with no links.  It sounds like you really have two objectives:

  1. maximize the number of matches
  2. minimize the cost of making that many matches

The following code calls the network solver twice (once per objective) on a network that includes an additional source node and sink node:

   /* maximize flow out of source */
   set CLIENTS = clients1 union clients2;
   num source = min {c in CLIENTS} c - 1;
   num sink   = max {c in CLIENTS} c + 1;
   set NODES = CLIENTS union {source,sink};
   num supply_lower {i in NODES} init (if i = sink   then -card(CLIENTS) else 0);
   num supply_upper {i in NODES} init (if i = source then  card(CLIENTS) else 0);
   set LINKS = ({source} cross clients1) union clientpairs2 union (clients2 cross {sink});
   num cost {<i,j> in LINKS} init (if i = source then -1 else 0);
   num link_upper {LINKS} = 1;
   solve with network / mcf direction=directed 
      links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper);

   /* minimize cost to send maximum flow from source */
   supply_lower[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
   supply_upper[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
   for {<i,j> in LINKS} cost[i,j] = (if <i,j> in clientpairs2 then diff[i,j] else 0);
   num flow {LINKS};
   solve with network / mcf direction=directed 
      links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper) out=(flow=flow);
   create data res from [client1 client2] = {<i,j> in clientpairs2: flow[i,j] > 0.5} diff;

The maximum number of matches is 55, and the resulting minimum cost is 97847.800403.

J_Grabowski
Calcite | Level 5

Thanks, that makes sense.

I was thinking that indeed it might require two objectives.

Thanks again for your help!

 

Regards,

Jakub

J_Grabowski
Calcite | Level 5

Rob, sorry to bother you again, but im getting an error, when trying to run your code:

image.png

Also, whats clientpairs2 in your code?

I assume its the client pairs for which the total difference is not bigger than 10%?

 

Regards,

Jakub

RobPratt
SAS Super FREQ

Sorry that I omitted this part of the code:

	set clientpairs2 = {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1};
	num diff {<c1,c2> in clientpairs2} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);

It looks like you are using an older release of SAS/OR in which the NODES= option LOWER was called WEIGHT and UPPER was called WEIGHT2.  Please try again using those old option names instead.

lipury
SAS Employee

This formulation uses the constraint solver with the ELEMENT and ALLDIFFERENT predicates to solve the original problem; the newer requirements haven't been addressed here, but it might be worth considering. 

J_Grabowski
Calcite | Level 5

Great, thanks, i'll have a look!

As im not very experienced with the procedure, its good to see a few approaches.

 

Regards,

Jakub

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1994 views
  • 1 like
  • 3 in conversation