Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-02-2018 12:39 PM
(1548 views)

Hi Everyone,

Im fairly new to optmodel, so this may be newbie question, but i've already spent alot of time on this and would appreciate greatly if you could help me out.

**The objective:**

I have two pools of clients that i am trying to pair based on some discreet criteria and their spend in a number of common weeks.

The objective is to find pairs that fulfill the criteria and minimize the sum of weekly differences between the pairs.

One client from each pool can be assigned to no more then one client from another pool.

**The problem:**

The code i wrote works fine as long as we have enough clients from the second pool to find a match for every client from the first pool. When that's not true, either due to the criteria, or simply not enough clients in pool 2 to find a match for every client in the pool1, the problems becomes infeasible.

This is not the behavior i am trying to achieve - what i would like to see in a situtation when its impossible to find a match for all the clients from pool1 is matching the best possible ones(so the ones minimizing the sum of differnces) and leaving out the infeasible ones.

**An example:**

I have prepared a simplified version of the code im using, that i think clearly showcases the problem:

```
/*Creating Datasets*/
data weeks;
do week = 201801 to 201810;
output;
end;
run;
/*5 clients in the first pool*/
data clients1;
do client_id = 1 to 5;
output;
end;
run;
/*only 4 clients in the second pool*/
data clients2;
do client_id = 101 to 104;
output;
end;
run;
data sales;
do client_id = 1 to 5;
do week = 201801 to 201810;
flag="clients1";
spends=1000*ranuni(1000);
output;
end;
end;
do client_id = 101 to 104;
do week = 201801 to 201810;
flag="clients2";
spends=1000*ranuni(1000);
output;
end;
end;
run;
proc optmodel;
set weeks;
read data weeks into weeks = [week];
/*clients 1*/
set clients1;
read data clients1 into clients1 = [client_id];
/*clients 2 spends per week */
num spends1 {clients1, weeks} init 0;
read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
/*clients2*/
set clients2;
read data clients2 into clients2 = [client_id];
/*clients 2 spends per week */
num spends2 {clients2, weeks} init 0;
read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
/*clients 1,2 matrix*/
set clientpairs = {c1 in clients1, c2 in clients2};
/*calculating the sum of absolute differences for each possible c1,c2 combination*/
impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
/*finding the combination that minimizes the total sum of differences*/
var match {clientpairs} binary;
min total_diff = sum{<c1,c2> in clientpairs} diff[c1,c2]*match[c1,c2];
/*each client1 is present once(here is the problem, but removing this contraint causes no pair to match - minimizes differences)*/
con one_c1_max {c1 in clients1}:
sum {<(c1),c2> in clientpairs} match[c1,c2] = 1;
/*each client2 can not be matched to more than 1 client1*/
con one_c2_max {c2 in clients2}:
sum {<c1,(c2)> in clientpairs} match[c1,c2] <= 1;
solve;
create data res(where =(match = 1)) from [client1 client2] = {clients1, clients2} match diff;
quit;
proc sql;
select * from res;
quit;
```

Let me know if everything is clear and how you would approach this problem.

Regards,

Jakub

EDIT: I see that the code doesnt display properly, ive attached the original version.

- Tags:
- optmodel

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

```
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
```

In that case, you can just omit the VAR, MIN, and CON statements.

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

```
/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
```

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

It is safer to use a loose comparison:

```
create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;
```

14 REPLIES 14

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

One way to get what you want is to interchange the <= 1 and = 1 in your two constraints, so that each client1 is matched at most once and each client2 is matched exactly once. The resulting optimal objective value is then 10197.831456.

Another approach is to instead use the network solver with the linear assignment algorithm (http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_networksolver_details18.htm&doc...😞

```
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
```

In that case, you can just omit the VAR, MIN, and CON statements.

Also, it is more efficient to use NUM instead of IMPVAR when the right-hand side does not depend on variables:

```
/* impvar diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);*/
num diff {<c1,c2> in clientpairs} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
```

Finally, the WHERE condition match = 1 is dangerous because binary variables can be slightly fractional (up to INTTOL):

It is safer to use a loose comparison:

```
create data res(where =(match > 0.5)) from [client1 client2] = {clients1, clients2} match diff;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rob,

Thanks for a quick reply!

The first solution indeed should solve it. This however was just an example, so i need to write a code that works in all possible cases. But still, I could check which set, c1 or c2, has more observations and then conditionally define constraints, I suppose, so thanks for the tip!

The second solution, at a first glance seems like a default approach to this kind of problems. I'll dig deeper, thank you.

In terms of efficiency, is there any argument to stick to a specific approach?

Regards,

Jakub

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you can decide which constraint should be <= 1 by comparing the cardinalities of the two sets. In fact, the linear assignment algorithm does this for you automatically, as described in the documentation link I put in my first reply.

The network solver with the linear assignment algorithm is more efficient (and requires less PROC OPTMODEL code) than the MILP solver for this type of problem.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Rob, I know what to do now.

Regards,

Jakub

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi again,

I have one more question, if you don't mind:

So i have been trying to implement the network solver solution you've suggested and i've ran into another problem.

When looking for the client pairs, I would like to only choose from pairs in which the difference of sum of spends from all the weeks is no bigger than 10%.

I can see that there are quite a few pairs like that, however, when im trying to solve it using modified "diff" as links i get an error that the linear assignment problem is infeasible (104 rows are unassigned).

When i tried using subgraph option for links the effect was the same.

Can you help me understand the reason behind it?

Below you can find the new code. I 've modified it slightly, to get more observations and to calculate the total difference.

```
%macro lap;
%let cl1=200;
%let cl2=700;
/*Creating Datasets*/
data weeks;
do week = 201801 to 201810;
output;
end;
run;
data clients1;
do client_id = 1 to &cl1;
output;
end;
run;
data clients2;
do client_id = 1001 to %eval(1000+&cl2);
output;
end;
run;
data sales;
do client_id = 1 to &cl1;
do week = 201801 to 201810;
flag="clients1";
spends=1000*ranuni(1000);
output;
end;
end;
do client_id = 1001 to %eval(1000+&cl2);
do week = 201801 to 201810;
flag="clients2";
spends=1000*ranuni(1000)*ranuni(2000);
output;
end;
end;
run;
proc optmodel;
set weeks;
read data weeks into weeks = [week];
/*clients 1*/
set clients1;
read data clients1 into clients1 = [client_id];
num spends1 {clients1, weeks} init 0;
read data sales(where=(flag="clients1")) nomiss into [client_id week] spends1=spends;
/*clients2*/
set clients2;
read data clients2 into clients2 = [client_id];
/*clients 2 spends per week */
num spends2 {clients2, weeks} init 0;
read data sales(where=(flag="clients2")) nomiss into [client_id week] spends2=spends;
/*clients 1,2 matrix*/
set clientpairs = {c1 in clients1, c2 in clients2};
/*NEW!! Calculating the total difference between each store */
num total_diff {<c1,c2> in clientpairs} = abs((sum {w in weeks} spends1[c1,w]) - (sum{w in weeks} spends2[c2,w]))/(sum{w in weeks} spends1[c1,w]);
/*calculating the sum of absolute differences for each possible c1,c2 combination*/
/*NEW!! Limited to pairs within 10% total sales difference*/
num diff {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
set <num,num> ASSIGNMENTS;
solve with network / lap direction=directed links=(weight=diff) out=(assignments=ASSIGNMENTS);
create data res from [client1 client2] = ASSIGNMENTS diff;
quit;
proc sql;
select * from res;
quit;
%mend;
%lap
```

Regards,

Jakub

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here, clients1 is the smaller set, so each c1 in clients1 is supposed to be matched exactly once. But many such clients have no outgoing links, as you can verify by running the following statements:

```
num out_degree {c1 in clients1} = card({<(c1),c2> in clientpairs: total_diff[c1,c2] <= 0.1});
put ({c1 in clients1: out_degree[c1] = 0});
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Indeed, there a quite a few clients1 with 0 links and these are the ones i wanted to exclude from the start, as they have no potential matches within the 10% difference.

I am not sure how to make the solver try to find matches only for the remaining clients.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The problem is infeasible, even after removing those c1 with no links. It sounds like you really have two objectives:

- maximize the number of matches
- minimize the cost of making that many matches

The following code calls the network solver twice (once per objective) on a network that includes an additional source node and sink node:

```
/* maximize flow out of source */
set CLIENTS = clients1 union clients2;
num source = min {c in CLIENTS} c - 1;
num sink = max {c in CLIENTS} c + 1;
set NODES = CLIENTS union {source,sink};
num supply_lower {i in NODES} init (if i = sink then -card(CLIENTS) else 0);
num supply_upper {i in NODES} init (if i = source then card(CLIENTS) else 0);
set LINKS = ({source} cross clients1) union clientpairs2 union (clients2 cross {sink});
num cost {<i,j> in LINKS} init (if i = source then -1 else 0);
num link_upper {LINKS} = 1;
solve with network / mcf direction=directed
links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper);
/* minimize cost to send maximum flow from source */
supply_lower[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
supply_upper[source] = -_OROPTMODEL_NUM_['OBJECTIVE'];
for {<i,j> in LINKS} cost[i,j] = (if <i,j> in clientpairs2 then diff[i,j] else 0);
num flow {LINKS};
solve with network / mcf direction=directed
links=(weight=cost upper=link_upper) nodes=(lower=supply_lower upper=supply_upper) out=(flow=flow);
create data res from [client1 client2] = {<i,j> in clientpairs2: flow[i,j] > 0.5} diff;
```

The maximum number of matches is 55, and the resulting minimum cost is 97847.800403.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, that makes sense.

I was thinking that indeed it might require two objectives.

Thanks again for your help!

Regards,

Jakub

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rob, sorry to bother you again, but im getting an error, when trying to run your code:

Also, whats clientpairs2 in your code?

I assume its the client pairs for which the total difference is not bigger than 10%?

Regards,

Jakub

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sorry that I omitted this part of the code:

```
set clientpairs2 = {<c1,c2> in clientpairs : total_diff[c1,c2] <= 0.1};
num diff {<c1,c2> in clientpairs2} = sum{w in weeks} abs(spends1[c1,w] - spends2[c2,w]);
```

It looks like you are using an older release of SAS/OR in which the NODES= option LOWER was called WEIGHT and UPPER was called WEIGHT2. Please try again using those old option names instead.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It works now, thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Great, thanks, i'll have a look!

As im not very experienced with the procedure, its good to see a few approaches.

Regards,

Jakub

SAS is headed **back** to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.

**Interested in speaking?** Content from our attendees is one of the reasons that makes SAS Innovate such a special event!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.