BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Germs
Fluorite | Level 6

 

The data has three columns(450 raws): itemID,

                                                                Pvalue(the probability of each item happens),

                                                                rptbis (the point-biserial correlation).

e.g. variable names+ first three rows (full data set is in excel document in attachment):

ItemID pvalue rptbis
00001 0.295 0.437
00002 0.033 0.869
00003 0.988 0.964

 

The goal is to pick up 50 items from the pool to meet a few requirements using MIP

 

Objective function

                           Min y;

Subject to:

                    sum(x[i]*Pvalue[i])>32-y;  /* sum of "pvalue" of the 50 items should be as close as 32*/

                    sum(x[i]*Pvalue[i])<32+y;

                    sum(x[i])=50;                   /*n=50 to pick up from the pool*/

                    x in (0, 1);                        /*x is a binary value in (0, 1) */

                    y>=0;                                /* I am not sure how to define thIS*/

                    13>=sum{ (pvalue[i]/(1-pvalue[i]))^(1/2)*rptbis[i])}>=12; /* the SD of the items picked should be between 12 and 13*/

 

 

I never get my code running, please help me with fixing the errors. Thank you!

 

proc optmodel;
set <str> pool;
num ItemID {pool};
num pvalue {pool};
num rptbis {pool};

read data itembank into pool=[ItemID]
pvalue rptbis;

 

var x{i in pool} binary; /*x in (0,1);*/

min y=sum{i in pool}(x[i]*pvalue[i])-32;  /*issue #1, how can I define the objective value >0 ?*/

                                                                                      


impvar sd=sum{i in pool}(pvalue[i]/(1-pvalue[i])^(1/2)*rptbis[i]);
con 13<=sd<=12;                                          /*when I had this constraint, it said "infeasible", sad!*/

con sum{i in pool}(x[i]*pvalue[i])>32-y;      /*when I had this 2 con, I wrote objective function as "min y", it said "no objective function"*/
con sum{i in pool}(x[i]*pvalue[i])<32+y;
con sum{i in pool}x[i]=50;
solve with MILP;
print x pvalue;

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

For question 1, I omitted sd.  And for this instance, the DECOMP option speeds up the MILP solve.

proc optmodel;
   set <str> pool;
   num ItemID {pool};
   num pvalue {pool};
   num rptbis {pool};
   read data itembank into pool=[ItemID] pvalue rptbis;
   set groups = 1..5;

   var x{i in pool, j in groups} binary;
   var y{groups} >= 0;

   min z = sum{j in groups} y[j];
   
/*   impvar sd=sum{i in pool}(pvalue[i]/(1-pvalue[i])^(1/2)*rptbis[i])*x[i];*/
/*   con 12<=sd<=13;*/
   impvar psum{j in groups} = sum{i in pool} pvalue[i]*x[i,j];
   con TargetGE{j in groups}: psum[j] >= 32 - y[j];
   con TargetLE{j in groups}: psum[j] <= 32 + y[j];
   con CountPerGroup{j in groups}: sum{i in pool} x[i,j] = 50;
   con AssignAtMostOnce{i in pool}: sum{j in groups} x[i,j] <= 1;

   solve with milp / decomp;

/*   print x pvalue;*/
   print psum;

   create data soldata from [i]=pool {j in groups} <col('x'||j)=x[i,j]>;
quit;

 

For question 2, the answer is yes.

View solution in original post

15 REPLIES 15
RobPratt
SAS Super FREQ

Please share your itembank data.

RobPratt
SAS Super FREQ

I made several changes here.  I'm not sure what you intended with sd, but your original IMPVAR had no variables on the right-hand side, so sd was a constant.

 

proc optmodel;
   set <str> pool;
   num ItemID {pool};
   num pvalue {pool};
   num rptbis {pool};
   read data itembank into pool=[ItemID]
      pvalue rptbis;

   var x{i in pool} binary; /*x in (0,1);*/

   var y >= 0;
   min z = y;*=sum{i in pool}(x[i]*pvalue[i])-32;  /*issue #1, how can I define the objective value >0 ?*/

/*   impvar sd=sum{i in pool}(pvalue[i]/(1-pvalue[i])^(1/2)*rptbis[i]);*/
   impvar sd=sum{i in pool}(pvalue[i]/(1-pvalue[i])^(1/2)*rptbis[i])*x[i];
*   con 13<=sd<=12;                                          /*when I had this constraint, it said "infeasible", sad!*/
   con 12<=sd<=13;

   con sum{i in pool}(x[i]*pvalue[i])>=32-y;      /*when I had this 2 con, I wrote objective function as "min y", it said "no objective function"*/
   con sum{i in pool}(x[i]*pvalue[i])<=32+y;
   con sum{i in pool}x[i]=50;
   solve;
   print x pvalue;
   print (sum{i in pool}(x[i]*pvalue[i]));
   print sd;
quit;
Germs
Fluorite | Level 6

Thank you soooooo much!!!

 

After last post, I have figured out the mistakes I made, and corrected the computation of SD term. But what you wrote is much better than mine.

 

Here I have 2 extra questions:

Question 1. For a further modification of this program, e.g. to add a dimension, the output will be like this:

[1]       x1   x2  x3  ....  x5;
00001  0     0   1  ....    0
00002  0    1   0   ....    0
00003  1    0   0   ....    0  

..................

..................

00450  0  1   0   ......    0;

which means I wish to select multiple groups x[i,j ] require:

for each group( in 1 to j) , choose 50 item from the pool:sum(x[ i ])=50;

each item can only goes into one group: sum(x)<=1, no overlap;

 

I created another dataset named "group" with only 1 variable (x1-x5), but I could not read x1-x5 into the dataset [pool] to make another dimension, I also don't know how to modify the program in "proc optmodel";

 

Question 2: can "proc optmodel" be used in SAS Macro? e.g.

%macro forms (group= );

data xxx;

set xxx;

.........;

run;

 

proc optmodel;

.......;

.......;

.......;

 

%mend;

%forms (group=5); 

 

Thank you!!! Appreciate!!!

RobPratt
SAS Super FREQ

For question 1, I omitted sd.  And for this instance, the DECOMP option speeds up the MILP solve.

proc optmodel;
   set <str> pool;
   num ItemID {pool};
   num pvalue {pool};
   num rptbis {pool};
   read data itembank into pool=[ItemID] pvalue rptbis;
   set groups = 1..5;

   var x{i in pool, j in groups} binary;
   var y{groups} >= 0;

   min z = sum{j in groups} y[j];
   
/*   impvar sd=sum{i in pool}(pvalue[i]/(1-pvalue[i])^(1/2)*rptbis[i])*x[i];*/
/*   con 12<=sd<=13;*/
   impvar psum{j in groups} = sum{i in pool} pvalue[i]*x[i,j];
   con TargetGE{j in groups}: psum[j] >= 32 - y[j];
   con TargetLE{j in groups}: psum[j] <= 32 + y[j];
   con CountPerGroup{j in groups}: sum{i in pool} x[i,j] = 50;
   con AssignAtMostOnce{i in pool}: sum{j in groups} x[i,j] <= 1;

   solve with milp / decomp;

/*   print x pvalue;*/
   print psum;

   create data soldata from [i]=pool {j in groups} <col('x'||j)=x[i,j]>;
quit;

 

For question 2, the answer is yes.

Germs
Fluorite | Level 6

Thank you so much @RobPratt

I will continue developing the code for my project. I will update how it goes. 

Germs
Fluorite | Level 6

Hello RobPratt. I have tested your code, it worked greatl. Thank you so much.

For my question 2 using SAS macro, I tried using %let group=5, group 1..&group.; it worked.

For my question 1: I still have some details to work on, should I consider the objective function you wrote, means to minimize each y-value in group 1..5?

 

 

 

 

     min z = sum{j in groups} y[j];<code></code>

 

 

I tried both using and not using "decomp" option, it made a difference in speed.

For the last statement using "decomp", can I also have “iis=on”?Are they comparable?

I wish to use MILP/branch and bound/indicate IIS for my project (is that possible?)

 

 

 

   solve with milp /decomp iis=on;<code></code>

 

 

 

 

 I tried to replace “decomp” with "IIS=on"(line 384), it gave me the following error: 

384 solve with milp /decomp  iis=on;

---
688
22
ERROR 688-782: The option IIS is unrecognized.

ERROR 22-322: Expecting one of the following: ABSOBJGAP, ALLCUTS, CONFLICTSEARCH, CUTCLIQUE,
CUTFLOWCOVER, CUTFLOWPATH, CUTGOMORY, CUTGUB, CUTIMPLIED, CUTKNAPSACK, CUTLAP,
CUTMILIFTED, CUTMIR, CUTOFF, CUTS, CUTSFACTOR, CUTSTRATEGY, CUTZEROHALF, DECOMP,
DECOMP_MASTER, DECOMP_MASTER_IP, DECOMP_SUBPROB, EMPHASIS, FEASTOL, HEURISTICS,
INTTOL, LOGFREQ, LOGLEVEL, MAXNODES, MAXSOLS, MAXTIME, NODESEL, NOPRIMALIN, OPTTOL,
PRESOLVER, PRIMALIN, PRINTFREQ, PRINTLEVEL2, PRIORITY, PROBE, RELOBJGAP, RESTARTS,
SCALE, SEED, STRONGITER, STRONGLEN, SYMMETRY, TARGET, TIMETYPE, VARSEL.

 

when I switched to  LP, it relax all the constraints and did not pick any x[i] for me.

  solve with lp / iis=on;

 

WARNING: The problem contains integer variables. The relaxed LP will be solved. Specify the
RELAXINT keyword on the SOLVE statement to avoid this warning.
NOTE: The IIS= option is enabled.
Objective
Phase Iteration Value Time
P 1 1 1.774000E+03 0
P 1 138 1.121796E+02 0
P 1 233 8.137021E+00 0
P 1 328 3.910344E-01 0
P 1 358 0.000000E+00 0
P 1 359 0.000000E+00 0
NOTE: The IIS= option found this problem to be feasible.
NOTE: The IIS solve time is 0.02 seconds.

RobPratt
SAS Super FREQ

The IIS option for the LP solver (not the MILP solver) is to identify causes of infeasibility.  Your problem is feasible, so this option is not applicable.

 

The y[j] variable measures the absolute error between psum[j] and the target value of 32.  The objective z minimizes the sum of these errors.  For your data, the optimal objective value turns out to be 0, which means that each group j meets the target exactly.  You can also see this from the PRINT statement that yields:

 

[1] psum
1 32
2 32
3 32
4 32
5 32
Germs
Fluorite | Level 6

I have chosen one of your previous solutions (2 dimension's)  as the best solution.Thank you so much!

 

My initial interest for SAS/OR is detecting IIS, remember you replied my first post here:

https://communities.sas.com/t5/Mathematical-Optimization/SAS-OR-in-LP-MIP-which-method-is-used-for-d...

I am a quantitative researcher but have limit knowledge in OR(this is the first time project), So I didn't really know IIS is for LP (not MILP). Thank you for telling this. I have several addition categorical constraints to add to the model, which I know will cause infeasibility. In addition to your code MILP version, I figured out for LP:

proc optmodel presolver=none;

....................
the code your developed;
....................
solve with lp relaxint;/ iis=on;

It seems to stop relaxing constraints; 

 

There is another problem, how many nodes are normal? when I add constraint SD, for group 1..3, it worked fine; for group=4, it slowed down a little bit; when group=5, it ran with thousands of nodes and could not give a solution (maybe there wasn't one). I had to stop the program manually. Is there anyway to avoid this? 

 

RobPratt
SAS Super FREQ
Please post the full code that is taking too long.
Germs
Fluorite | Level 6

Rob, I noticed for this example (in 2-dimension), almost every constraint used "SUM", but if I want to define every single value in a linear program, e.g., pvalue[i]<0.2, how should I create this constraint?  Thank you!

RobPratt
SAS Super FREQ

I don't quite understand your question.  In your program, pvalue[i] is a numeric constant, not a variable.

Germs
Fluorite | Level 6

To say if I want to create a quantitative constraint "choose items from the dataset with pvalue>=0.1" , how should I express this constraint in SAS/OR? This time the question should be clear. Thank you!

All the constraints in this example used "sum", tried a few times but failed. 

RobPratt
SAS Super FREQ

OK, this sounds like the same situation that is addressed in this documentation example.  The first approach uses an explicit constraint:

   /* do not assign customer to site more than 30 units away */
   con distance_at_most_30 {i in CUSTOMERS, j in SITES: dist[i,j] > 30}:
      Assign[i,j] = 0;

The second approach (more efficient) instead modifies the index set to avoid creating those variables at all.

Germs
Fluorite | Level 6

I will try it later today to see if I can work it out. SAS/OR is difficult for beginner.

 

Thanks for the example. What is the second approach in details?

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1954 views
  • 4 likes
  • 2 in conversation