SAS Optimization, and SAS Simulation Studio

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

a week ago

I am trying to write a model that selects items for an assessment. We have a measure of lexical similarity between item pairs that we would like to minimize. The intent is to select a set of items that are as diverse as possible. The similarity index is essentially a correlation matrix.

The code below seems to do what I want for simple problems (e.g., selecting 15 items from a pool of 100), but it does not scale well when either the pool or the number of items selected increase. Are there any suggestions on how to make this model more efficient? A large percentage of the elements in the correlation matrix are 0, if that affects any suggestions.

Thank you.

```
proc optmodel;
set QuestionIDs;
read data ItemData into QuestionIDs = [questionid];
num CSI {i in QuestionIDs,j in QuestionIDs} init 0;
read data tall into [Q1 Q2] CSI CSI[Q2,Q1]=CSI;
var x{QuestionIDs} BINARY;
constraint TotItems: sum{i in QuestionIDs}x[i]=15;
var Ynew{i in questionIDs, j in questionIDs: j>i} BINARY;
constraint lin1{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[i];
constraint lin2{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[j];
constraint lin3{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] >= x[i]+x[j]-1;
min totCSI = sum{i in questionIDs, j in questionIDs: j>i}Ynew[i,j]*CSI[i,j];
solve with milp;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to CaseyCodd

a week ago

Assuming CSI[i,j] >= 0, your constraints lin1 and lin2 will naturally be satisfied because of the objective, so you could reduce the problem size by omitting them. If that doesn't help, please share the data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RobPratt

a week ago

The selection variable X determines whether an individual item will appear on the test. Because CSI is about correlations (i.e., relationships between* item pairs*), I created Ynew to determine whether the pair X[i] and X[j] were both selected. Logically, Ynew[i,j] = X[i]*X[j]. In words, if both i and j are selected, then the item pair comprised of i and j is also selected. If either i or j is not selected, then the pair [i,j] is also not selected. The problem with Ynew[i,j]=X[i]*X[j] is that it is nonlinear. Therefore, I used the constraints lin1, lin2, and lin3 to linearize that relationship. Is there a better way to do this part?

I added some data to the code below. I reduced the example to just 5 items for simplicity.

```
data Itemdata;
input questionid;
datalines;
227447
227450
227451
227452
227481
;
data CSI;
input Q1 Q2 CSI;
datalines;
227447 227447 1
227450 227450 1
227451 227447 0.084515426
227451 227451 1
227452 227447 0.07100716
227452 227450 0.019418391
227452 227452 1
227481 227447 0.135526185
227481 227481 1
;
proc optmodel;
set QuestionIDs;
read data ItemData into QuestionIDs = [questionid];
num CSI {i in QuestionIDs,j in QuestionIDs} init 0;
read data tall into [Q1 Q2] CSI CSI[Q2,Q1]=CSI;
var x{QuestionIDs} BINARY;
constraint TotItems: sum{i in QuestionIDs}x[i]=3;
var Ynew{i in questionIDs, j in questionIDs: j>i} BINARY;
constraint lin1{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[i];
constraint lin2{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[j];
constraint lin3{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] >= x[i]+x[j]-1;
min totCSI = sum{i in questionIDs, j in questionIDs: j>i}Ynew[i,j]*CSI[i,j];
solve with milp;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to CaseyCodd

a week ago

Yes, that is the standard way to model the product of two binary variables. But because of your minimization objective, the solver naturally "wants" to make Ynew[i,j] small, so without loss of optimality you can omit constraints lin1 and lin2 and enforce the nonlinear relationship Ynew[i,j] >= x[i]*x[j] via the linear constraint lin3. You will get the same optimal objective value.

If CSI[i,j] = 0, the solver might return Ynew[i,j] = 1 when x[i] = 0 and x[j] = 0. If it is important that Ynew[i,j] = x[i]*x[j], you can postprocess as follows:

```
proc optmodel;
set QuestionIDs;
read data ItemData into QuestionIDs = [questionid];
num CSI {i in QuestionIDs,j in QuestionIDs} init 0;
read data CSI into [Q1 Q2] CSI CSI[Q2,Q1]=CSI;
var x{QuestionIDs} BINARY;
constraint TotItems: sum{i in QuestionIDs}x[i]=3;
var Ynew{i in questionIDs, j in questionIDs: j>i} BINARY;
/* constraint lin1{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[i];*/
/* constraint lin2{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[j];*/
constraint lin3{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] >= x[i]+x[j]-1;
min totCSI = sum{i in questionIDs, j in questionIDs: j>i}Ynew[i,j]*CSI[i,j];
solve with milp;
print X Ynew;
/* postprocess */
for {i in QuestionIDs, j in QuestionIDs: j>i} Ynew[i,j] = x[i]*x[j];
print X Ynew;
print totCSI;
quit;
```

Omitting constraints lin1 and lin2 reduces the problem size and will likely speed up the solver. Please try it on your larger instance and see if it helps. If it is still too slow, I have a couple of further ideas.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RobPratt

a week ago

Thank you for that suggestion. I was previously getting out of memory errors almost instantly, and with this modification it ran for more than an hour before giving an out of memory error. If you have other suggestions, I would appreciate them. Thank you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to CaseyCodd

a week ago

Please share your data, and I'll take a look.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RobPratt

a week ago

The attached data files are more representative of the larger problem. Change the selection in the previous code to 150 instead of 3.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to CaseyCodd

Friday - last edited Saturday

Here are three additional ideas you can try individually or in combination:

1. Relax the Ynew variables to >= 0 instead of binary. They will automatically take {0,1} values at optimality.

2. Add the following valid constraint, which can be derived by multiplying both sides of TotItems by x[j]:

```
constraint Cut {j in QuestionIDs}:
sum {i in QuestionIDs: j>i} Ynew[i,j]
+ sum {i in QuestionIDs: j<i} Ynew[j,i]
= (150-1)*x[j];
```

3. Solve the (nonconvex) NLP relaxation of the original quadratic problem and round the resulting solution to get an integer feasible solution, which you can use as a warm start for the MILP solver:

```
proc optmodel;
set QuestionIDs;
read data ItemData into QuestionIDs = [questionid];
num CSI {i in QuestionIDs,j in QuestionIDs} init 0;
read data tall into [Q1 Q2] CSI CSI[Q2,Q1]=CSI;
var x{QuestionIDs} BINARY;
constraint TotItems: sum{i in QuestionIDs}x[i]=150;
min totCSI_quadratic = sum{i in questionIDs, j in questionIDs: j>i} CSI[i,j]*x[i]*x[j];
solve with nlp relaxint / ms;
print {i in QuestionIDs: x[i].sol > 1e-6} x;
/* var Ynew{i in questionIDs, j in questionIDs: j>i} BINARY;*/
var Ynew{i in questionIDs, j in questionIDs: j>i} >= 0;
/* constraint lin1{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[i];*/
/* constraint lin2{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[j];*/
constraint lin3{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] >= x[i]+x[j]-1;
for {i in QuestionIDs} x[i] = round(x[i]);
for {i in QuestionIDs, j in QuestionIDs: j>i} Ynew[i,j] = x[i]*x[j];
min totCSI = sum{i in questionIDs, j in questionIDs: j>i}Ynew[i,j]*CSI[i,j];
/* optional */
* constraint Cut {j in QuestionIDs}:
sum {i in QuestionIDs: j>i} Ynew[i,j]
+ sum {i in QuestionIDs: j<i} Ynew[j,i]
= (150-1)*x[j];
solve with milp / primalin;
create data outdata from [i]={i in QuestionIDs: x[i].sol > 0.5} x;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RobPratt

Tuesday

Thank you for your help, Rob.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to CaseyCodd

Tuesday

Glad to help. In case you want to do a literature search, your problem is equivalent to the maximum edge-weight clique problem (MEWCP), which is NP-hard. To see the correspondence, just negate your objective coefficients and change min to max.