BookmarkSubscribeRSS Feed
Bob291032
Calcite | Level 5
I have a set of cases and a set of controls. For each possible case-control pair, there is a cost associated with the pairing. Sometimes the pair is not allowed (e.g. cost is too high) so the cost is set to missing. The dataset then has one observation per case, with one variable per control, with the value of each variable either missing or set to the cost of matching the case to that particular control. i.e. it's an N x M matrix of N cases (observations) and M controls (variables).

Because this is a large dataset with missing costs, PROC ASSIGN errors out and reports that the problem is infeasible, though it does provide a reasonable solution. I would like to reformulate this as a linear program so I can solve this optimization problem using PROC OPTMODEL and get the optimal solution. I'm reading the PROC OPTMODEL docs and looking at SAS white papers, but this stuff gets pretty complicated to a newbie like me and my problem is probably one of the easiest types of problems you can solve with PROC OPTMODEL so I was wondering if anyone on this forum can help me reformulate the problem. I do know how to manipulate/transform/restructure datasets in SAS, write macros, etc. but the theoretical component of OPTMODEL is a over my head right now.

Please advise.

Thanks!
-Bob
31 REPLIES 31
Philipp_SAS
SAS Employee
Hello Bob,

Fortunately there is a very elegant and efficient way to solve your problem using OPTMODEL. It is possible to have a sparse set of pairs which only includes pairs for which the cost is not missing. I put together a small example that shows this. Note that the () around i in the constraint mean that we use the i from the outside, in other words, we want those pairs where i is fixed and j is changing.

I only added the constraint that each case has to be assigned to one control but many cases can use the same control. You might need some more constraints depending on what the logic of you problem is. When adding more constraints you might have to declare Assign to be binary.

Heres the example:

/*****************************/
data cases_and_controls;
length case $5;
input case control1-control5;
datalines;
case1 10 20 . 4 23
case2 4 33 34 . 23
case3 21 . 21 12 10
run;

proc optmodel;
/* define sets for cases and controls */
set <str> CASES;
set <str> CONTROLS init (union{i in 1..5} {"control"||i});

/* define cost parameter */
num cost{CASES, CONTROLS};

/* read in data including missing values */
read data cases_and_controls into CASES=[case] {j in CONTROLS} < cost[case, j] = col(j)>;

/* create a set of pairs which are possible, i.e. cost is not missing */
set <string, string> POSSIBLE = {i in CASES, j in CONTROLS: cost[i,j] ne .};

/* we only need variables for possible pairs */
var Assign {POSSIBLE} >= 0;

min z = sum {<i,j> in POSSIBLE} cost[i,j]*Assign[i,j];

con case_assign{i in CASES}: sum{<(i),j> in POSSIBLE} Assign[i,j] = 1;

/* expand the model for debugging */
*expand;

/* solve the model */
solve;

/* print the result */
print Assign;
quit;
/*****************************/

Don't hesitate to ask more questions. You can also contact SAS Tech Support if you have any problems.

Have fun learning (mixed integer) linear programming,

Philipp Corrected the code, got messed up because of < signs in it.


Message was edited by: Philipp@SAS
Philipp_SAS
SAS Employee
Sorry, something got messed up, the read data line should look like this:

read data cases_and_controls into CASES=[case] {j in CONTROLS} < cost[case, j] = col(j) >;
Bob291032
Calcite | Level 5
Wow, Philipp@SAS that's a fantastic start. I've already got it working with my data. How would I add in a constraint that each control can only be matched to one case?

I am also trying to figure out how to about the solution in a dataset, ideally:

Case | Ctrl
case2 | ctrl4
case1 | ctrl24
case19| ctrl1 Message was edited by: Bob3000
Philipp_SAS
SAS Employee
Great that this is working for you. Since you mentioned that you have a different number of controls and cases I guess you want each control assigned to at most 1 case but some controls might not be used. The constraint for that would be like this:

con control_limit{j in CONTROLS}: sum{<i,(j)> in POSSIBLE} Assign[i,j] <= 1;

The classical assignment problem requires two sets of equal size and that all elements of one set have to be assigned to exactly one of the other. For that you would use = instead of <=, but the problem would be infeasible as soon as you have a different number of cases and controls.

Writing the data as you want it is actually really easy, though I have to admit it took a little while to figure it out. This should do the job:

create data result from [case control]={<i,j> in POSSIBLE: Assign[i,j] > 0.5};

Using > 0.5 should avoid trouble if you get slight numerical inaccuracies because of floating point arithmetic.

Have fun
Philipp
Bob291032
Calcite | Level 5
Almost there, Philipp. The only problem remaining is the feasibility issue, which ironically is what brought me to OPTMODEL in the first place. When there's no way to match each case with a control (again, due to large percentage of case-control distances exceeding the maximum distance), OPTMODEL simply quits. What I'd like is to have OPTMODEL make as many case-control matches as possible while minimizing distance, but it's fine if some cases and some controls aren't matched.

For instance,

data cases_and_controls;
length case $5;
input case control1-control3
datalines;
case1 . . 2
case2 1 5 .
case3 . . 1
run;

Case 1 and Case 3 can only potentially match control 3, but since they can't both match control 3, OPTMODEL determines the problem is infeasible and quits. An optimal solution in this case could be matching case2-control1 and case3-control3, and not matching case1 (or control 2).

I am not sure how to formulate a constraint for this as there's no minimum number of cases I need matched. In fact, I want to maximize the number of cases matched, while minimizing the overall cost of matching.
Philipp_SAS
SAS Employee
Ok, so you actually have two objectives:

1. Match as many pairs as possible
2. Minimize the cost of the matchings

There is many different ways to deal with two objectives depending on how the relation between the two is. As I understand in your case we can get away with something fairly simple.

First we change the case_assign constraint to <=. If we solve this model then the result will be 0, because if you do not have to assign anything and the cost is positive the cheapest thing is to do nothing.

Now we have to enforce the first objective. One way to do this is to modify the objective function. My suggestion is something like this:

num maxCost = max{<i,j> in POSSIBLE} cost[i,j];

min z = sum {<i,j> in POSSIBLE} cost[i,j]*Assign[i,j] - sum {<i,j> in POSSIBLE} (maxCost+1)*Assign[i,j];

This should force as many Assign variables to 1 as possible while still minimizing cost as a secondary objective.

I do not know if this really is what you want and if this is really the best way to handle your specific situation. Multi-objective obtimization can get quite tricky. Another possibility would be to introduce penalty variables for each case and control, then you can penalize with exact numbers how much it "costs" to not assign a case or a control. This might be needed if you want to avoid assignment to really expensive controls if they are the only possibility left.

I hope this helps
Philipp
Bob291032
Calcite | Level 5
I tried your suggestions and checked it against my old algorithm. Nearly identical results but (a) no errors and (b) runs in minutes instead of hours. PROC OPTMODEL gets five stars from me, as do you Peter. Thanks!

-Bob
Turkeyboy
Calcite | Level 5

Sorry for digging up this old thread, but this is exactly the issue i'm trying to solve, but i can't seem to get the code above to work.

First, when i use

var Assign >=0;

I get an error on the objective function line as follows:

min z = sum {<i,j> in POSSIBLE} cost[i,j]*Assign[i,j];

                                                           -

                                                           647

ERROR 647-782: The name 'Assign' must be an array.

I can fix this error by using the following:

var Assign{CASES,CONTROLS} >= 0;

Is this reasonable?

..

I am having a couple of additional problems. If I use this code, which I believe follows from reading all the posts in this thread, I end up with Case1 and Case3 both matching to Control3. Is there a way to prevent multiple cases matching to a single control?

data cases_and_controls;

    length case $5;

    input case control1-control3;

    datalines;

case1   . . 2

case2   1 5 .

case3   . . 1

run;

proc optmodel;

/* define sets for cases and controls */

set <str> CASES;

set <str> CONTROLS init (union{i in 1..3} {"control"||i});

/* define cost parameter */

num cost{CASES, CONTROLS};

/* read in data including missing values */

read data cases_and_controls into CASES=[case] {j in CONTROLS} < cost[case, j] = col(j) >;

/* create a set of pairs which are possible, i.e. cost is not missing */

set <string, string> POSSIBLE = {i in CASES, j in CONTROLS: cost[i,j] > 0};

/* we only need variables for possible pairs */

var Assign{CASES,CONTROLS} >= 0;

num maxCost = max{<i,j> in POSSIBLE} cost[i,j];

min z = sum {<i,j> in POSSIBLE} cost[i,j]*Assign[i,j] - sum {<i,j> in POSSIBLE} (maxCost+1)*Assign[i,j];

con case_assign{i in CASES}: sum{<(i),j> in POSSIBLE}  Assign[i,j] <= 1;

 

/* solve the model */

solve;

/* print the result */

print Assign;

create data assign from [i j] = {i in CASES, j in CONTROLS: Assign[i,j] >= 0.5} cost;

quit;

Philipp_SAS
SAS Employee

Hi,

Your modification to "var ASSIGN" looks right.If you want at most one case per control you can add this constraint:

con

control_assign{j in CONTROLS}: sum{<i,(j)> in POSSIBLE} Assign[i,j] <= 1;


Philipp

Turkeyboy
Calcite | Level 5

That works perfectly!

Thanks a million.

Giamma14
Fluorite | Level 6

Dear All,

sorry for opening again this old threat. I'm analysing a match control study and I have ~ 3,000 cases to match to ~12,000 controls, with a maximum of 4 controls per case and the constraint of a control to be used only once. I then have a matrix dataset; cases are rows, controls are columns, and the cells contains the distance. Need to find the matching strategy that minimises the total cost.

I've used the code provided by I get an error message 'out of memorey': do you have any idea of how to circumvent this (maybe this is not the right approach)? And do you know what are the limits in term of dataset size proc optmodel can deal with?

Many thanks in advance,

Gianmario

RobPratt
SAS Super FREQ

As of SAS/OR 12.1 (released in August 2012), the recommended approach is to use PROC OPTNET:

   proc optnet data_matrix=mymatrix;

      linear_assignment out=out;

   run;

See this SAS Usage Note for more details:

35026 - Solving a Linear Assignment Problem with SAS/OR

Giamma14
Fluorite | Level 6

Many thanks Rob!

We are installing SAS 9.3 M2 in these days so I'll be able to try it soon. I've just had a quick look at the documentation, maybe you already know whether what I need is available in proc optnet.

In my example I need to match a case with up to 4 controls (if available): is it possible to do this?

Thanks,

Gianmario

RobPratt
SAS Super FREQ

PROC OPTNET does not do one-to-many matching directly, but you can transform a one-to-many matching problem to an equivalent one-to-one matching problem, as done in the popular Mayo clinic macro %vmatch, which uses the (legacy) PROC ASSIGN:

http://www.mayo.edu/research/documents/vmatchsas/DOC-10027394

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 31 replies
  • 4856 views
  • 0 likes
  • 5 in conversation