Hello, sorry to come back to ask for help again, but I'm stuck and can't make any more progress by myself. I'm running OR version 15.2 btw.
An overview of my problem is that I need to generate 5 test forms of 30 items from a pool of items (136), items can appear at most 3 times across forms, and share only 25% of items among each pair of forms. There have to be a specific amount of items from categories A,B,C. The output is a matrix of 0/1 values (not assigned/assigned) of size items*forms [136,5].
Each item has a content, time, and probability attached value.
I need to select items based on the constraints above for 3 problems to minimize deviation from target values:
I verified my code works for problem 1 for all 5 forms. When I add in the second target of time I run out of memory. I was able to verify that this code works for problems 2 and 3 when there are only 2 forms, and it solves relatively quickly. This makes me think I've just poorly formulated the problem and that there is a better way to go about doing this.
I've attached the code and dataset that populates everything. If you run the program, the datasets check_sumATA_form&cond are summary data for each problem that should match values from sum_target_formclean.
I've read through a lot of posts and articles about decomposition of problems, but I just don't know enough about mathematical programming structure to understand which applies to my case and how I'd implement it. I know it's a lot to ask, but I was hoping someone could either provide some code to get me started, or point me in the right direction of what method I should use (or if it's possible) to divide and conquer the problem into a master+subproblem. The closest thing I could find was reference to bender's decomposition here (also have the 1962 article but it is beyond me), a santa's gift (linear assignment) problem here, and a traveling baseball fan here which didn't seem to fit since I don't know how many solutions there will be based on problem 1 before adding time.
From what I read, it seems like you have to find where you might be able to split your problem by finding properties that apply to only some parts of the problem. I'm confused if this works for me because all of my constraints are based on a single binary variable of if the item is assigned or not. The only 'decomposition' I could think of would be to split the constraints into those that apply to the single forms, and then those that apply to constraints for the multiple forms. So essentially solve for individual forms and then loop through and see if it works for when 5 of those are put together.
I am more than happy to try to write some code and return with a more specific SAS question if someone points me in the right direction with an example that will work. I've looked through quite a few examples/papers/videos, and since I don't know if they really apply, I can't differentiate between me doing something wrong, and it just not being applicable to my problem. I was originally told to use MILP, but at this point, if that's not the right fit I can bring up another suggestion (network flow?) Sorry if my understanding is not correct and some of my suggestions or references are ridiculous.
Thanks for any help you guys can provide
I have four suggestions for you to try, alone or in combination, in increasing order of complication:
1. Relax the z variable from binary to >= 0. It will automatically take a binary value if x is binary.
2. Add the following optional Reformulation-Linearization Technique (RLT) constraint, which is derived by multiplying both sides of max_three by x[i,b]:
con max_three_rlt {i in items, b in FORMS}:
sum{<j,(b)> in FORM_PAIRS} z[i,j,b] + sum{<(b),j> in FORM_PAIRS} z[i,b,j] <= 2 * x[i,b];
3. Replace the IMPVAR declarations for total_p and total_time with explicit variables and constraints. Doing so reduces the total number of nonzero coefficients in the constraint matrix, and that usually speeds up the underlying LP solves.
4. Instead of the inequality-based formulation for absolute value (L1 norm), use the equality-based formulation, with Surplus and Slack variables, as shown in this example.
None of these suggestions will universally improve the solve time. Sometimes they help, and sometimes they hurt.
I have four suggestions for you to try, alone or in combination, in increasing order of complication:
1. Relax the z variable from binary to >= 0. It will automatically take a binary value if x is binary.
2. Add the following optional Reformulation-Linearization Technique (RLT) constraint, which is derived by multiplying both sides of max_three by x[i,b]:
con max_three_rlt {i in items, b in FORMS}:
sum{<j,(b)> in FORM_PAIRS} z[i,j,b] + sum{<(b),j> in FORM_PAIRS} z[i,b,j] <= 2 * x[i,b];
3. Replace the IMPVAR declarations for total_p and total_time with explicit variables and constraints. Doing so reduces the total number of nonzero coefficients in the constraint matrix, and that usually speeds up the underlying LP solves.
4. Instead of the inequality-based formulation for absolute value (L1 norm), use the equality-based formulation, with Surplus and Slack variables, as shown in this example.
None of these suggestions will universally improve the solve time. Sometimes they help, and sometimes they hurt.
Thank you so much for the suggestions. After making these changes and trying other variations of our problems, we landed on a single objective with time as a constraint rather than aiming for 0 deviation from targets for both as optimal. We also used absolute objective gap settings to limit processing time and still reach a decent target.
I will be marking your response as the solution since they are valid solutions even though we ended up taking a different approach in the end. And as a note for other readers: I couldn't modify the settings for SAS in the config file b/c I didn't have permissions, so I just had to use the command line to open an instance of SAS with "-memsize" option as shown below. In case you can't edit the actual SAS files themselves and this might help someone in the future.
C: cd Program Files\SASHome\SASFoundation\9.4\ sas.exe -memsize 16G
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.