The selection variable X determines whether an individual item will appear on the test. Because CSI is about correlations (i.e., relationships between item pairs), I created Ynew to determine whether the pair X[i] and X[j] were both selected. Logically, Ynew[i,j] = X[i]*X[j]. In words, if both i and j are selected, then the item pair comprised of i and j is also selected. If either i or j is not selected, then the pair [i,j] is also not selected. The problem with Ynew[i,j]=X[i]*X[j] is that it is nonlinear. Therefore, I used the constraints lin1, lin2, and lin3 to linearize that relationship. Is there a better way to do this part? I added some data to the code below. I reduced the example to just 5 items for simplicity. data Itemdata;
input questionid;
datalines;
227447
227450
227451
227452
227481
;
data CSI;
input Q1 Q2 CSI;
datalines;
227447 227447 1
227450 227450 1
227451 227447 0.084515426
227451 227451 1
227452 227447 0.07100716
227452 227450 0.019418391
227452 227452 1
227481 227447 0.135526185
227481 227481 1
;
proc optmodel;
set QuestionIDs;
read data ItemData into QuestionIDs = [questionid];
num CSI {i in QuestionIDs,j in QuestionIDs} init 0;
read data tall into [Q1 Q2] CSI CSI[Q2,Q1]=CSI;
var x{QuestionIDs} BINARY;
constraint TotItems: sum{i in QuestionIDs}x[i]=3;
var Ynew{i in questionIDs, j in questionIDs: j>i} BINARY;
constraint lin1{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[i];
constraint lin2{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] <= x[j];
constraint lin3{i in QuestionIDs, j in QuestionIDs: j>i}: Ynew[i,j] >= x[i]+x[j]-1;
min totCSI = sum{i in questionIDs, j in questionIDs: j>i}Ynew[i,j]*CSI[i,j];
solve with milp;
... View more