Hello,
I am relatively new to SAS and am working on an optimization problem using PROC OPTMODEL. The problem is the following: different products have different annual sales (e.g. $10,000 for product x, $15,000 for product y, etc). In addition to the actual sales, each product is assigned *estimated* sales in order to to guide things like marketing budgets, etc. The estimated sales are grouped by product category, meaning that, within a product category, different individual products can either exceed, or fall short of, this estimated sales.
Currently, the majority of products have actual sales that exceed estimates sales. I would like to reset the estimated sales targets in such a way that minimizes the number of times this happens. Essentially, I want to perform an optimization over the set of products and find the new level of estimated sales that minimizes the number of times that true sales exceed estimated sales. The obvious answer is to reset the estimate sales level to something very high, but I need to constrain the new estimated sales to be relatively close to the previous estimate sales.
As I wrote above, I think that PROC OPTMODEL might work for this. I have coded a toy example which I believe captures the salient features of this problem. The problem includes actual sales for 8 (imaginary) products, and has as the constraint that the optimal level of estimates sales can not exceed 20 when summed over all 8 products. I'm modeling this using the sign() function.
The issue, however, is that the code, below, doesn't give me the correct answer! The optimal solution ("thresh") should be anything between 2 and 2.5. This would lead actual sales to exceed estimated sales for 6 of the 8 products without violating the constraint. As is, the code hits the maximum number of iterations at 5000, and returns a solution of .9999.
Question 1: why is this not returning a correct answer? What am I doing wrong?
data sales;
input spend @@;
datalines;
1 5 6 10 12 13 3 2
;
proc optmodel;
set SPENDING;
num spend {SPENDING};
read data sales into SPENDING = [_N_] spend;
print spend;
var thresh>=0;
var signfunction{i in SPENDING};
min p = sum{i in SPENDING} signfunction[i];
con sf {i in SPENDING}: signfunction[i]= sign(spend[i] - thresh);
con bc: (sum{i in SPENDING} thresh) <= 20;
solve;
print thresh; /*this SHOULD be something from 2 to 2.5...*/
quit;
Question 2: When I impose the constraint sf in the optimization problem, this gives me an entirely different (and wrong) answer of an optimal threshold of .01. See below. Why is this happening?
proc optmodel;
set SPENDING;
num spend {SPENDING};
read data sales into SPENDING = [_N_] spend;
print spend;
var thresh>=0;
min p = sum{i in SPENDING} sign(spend[i] - thresh);
con bc: (sum{i in SPENDING} thresh) <= 20;
solve;
print thresh; /*this SHOULD be something from 2 to 2.5...*/
quit;
Thank you in advance.
Note that you can replace the bc constraint with a simple upper bound on the thresh variable:
thresh.ub = 20 / card(SPENDING);
Alternatively, you can impose the upper bound during variable declaration:
var thresh >= 0 <= 20 / card(SPENDING);
The NLP solver can get stuck in a local minimum for your formulation. To increase the likelihood of finding a global minimum, use the multistart option:
solve with nlp / ms;
Another idea is to linearize the problem and use the MILP solver:
proc optmodel;
set SPENDING;
num spend {SPENDING};
read data sales into SPENDING = [_N_] spend;
print spend;
var thresh >= 0 <= 20 / card(SPENDING);
var signbinary{SPENDING} binary;
impvar signfunction{i in SPENDING} = 2*signbinary[i] - 1;
min p = sum{i in SPENDING} signfunction[i];
/* if spend[i] > thresh then signbinary[i] = 1 */
/* equivalently, if signbinary[i] = 0 then spend[i] <= thresh */
con signcon1{i in SPENDING}:
spend[i] - thresh <= (spend[i] - thresh.lb) * signbinary[i];
/* if spend[i] < thresh then signbinary[i] = 0 */
/* equivalently, if signbinary[i] = 1 then spend[i] >= thresh */
con signcon2{i in SPENDING}:
thresh - spend[i] <= (thresh.ub - spend[i]) * (1 - signbinary[i]);
solve;
print thresh; /*this SHOULD be something from 2 to 2.5...*/
print signbinary signfunction;
quit;
Note that you can replace the bc constraint with a simple upper bound on the thresh variable:
thresh.ub = 20 / card(SPENDING);
Alternatively, you can impose the upper bound during variable declaration:
var thresh >= 0 <= 20 / card(SPENDING);
The NLP solver can get stuck in a local minimum for your formulation. To increase the likelihood of finding a global minimum, use the multistart option:
solve with nlp / ms;
Another idea is to linearize the problem and use the MILP solver:
proc optmodel;
set SPENDING;
num spend {SPENDING};
read data sales into SPENDING = [_N_] spend;
print spend;
var thresh >= 0 <= 20 / card(SPENDING);
var signbinary{SPENDING} binary;
impvar signfunction{i in SPENDING} = 2*signbinary[i] - 1;
min p = sum{i in SPENDING} signfunction[i];
/* if spend[i] > thresh then signbinary[i] = 1 */
/* equivalently, if signbinary[i] = 0 then spend[i] <= thresh */
con signcon1{i in SPENDING}:
spend[i] - thresh <= (spend[i] - thresh.lb) * signbinary[i];
/* if spend[i] < thresh then signbinary[i] = 0 */
/* equivalently, if signbinary[i] = 1 then spend[i] >= thresh */
con signcon2{i in SPENDING}:
thresh - spend[i] <= (thresh.ub - spend[i]) * (1 - signbinary[i]);
solve;
print thresh; /*this SHOULD be something from 2 to 2.5...*/
print signbinary signfunction;
quit;
Thank you very much! I hadn't know about the ms option - this works as it should now.
Sorry about this - one more question. I'm trying a simple extension of the same problem, but now across two product groups. Now, the task is to find the thresholds (thresh1 and thresh2) which minimize the total number of times the actual sales exceeds total sales, capped to an overall constraint (in the example below, that total estimated sales can't exceed 9).
Here is the code I'm using. The optimal thresholds given the constraints in this toy example are 3 for threshold 1, and 0 for threshold 2. It seems to work (solutions are very close to 3 and 0), but it's slow and, as the number of product groups grow, will be very inefficient to write out. I can't quite nail the indexing (I think). Is there a way to make the code more efficient? Any advice would be much appreciated.
Many thanks.
/*here is product group 1*/
data sales1;
input spend1 @@;
datalines;
1 2 3
;
/*here is product group 2*/
data sales2;
input spend2 @@;
datalines;
11 12 13
;
proc optmodel;
/*read in spending for product group 1 */
set SPENDING1;
num spend1 {SPENDING1};
read data sales1 into SPENDING1 = [_N_] spend1;
print spend1;
/*read in spending for product group 2*/
set SPENDING2;
num spend2 {SPENDING2};
read data sales2 into SPENDING2 = [_N_] spend2;
print spend2;
/*declare two threshold variables */
var thresh1>=0;
var thresh2>=0;
/*declare the problem - minimize both exceptions at the same time*/
min p = sum{i in SPENDING1} sign(spend1[i] - thresh1) + sum{j in SPENDING2} sign(spend2[j] - thresh2);
/*constraint - total estimated spending, across ALL products, has to be less than 9*/
con bc: (sum{i in SPENDING1} thresh1 + sum{j in SPENDING2} thresh2) <= 9;
solve with nlp / ms;
print thresh1 thresh2; /*this SHOULD be 3 for thresh1 and 0 for thresh2...*/
quit;
Here's one way to write the model more compactly:
proc optmodel;
set GROUPS = 1..2;
set SPENDING{GROUPS};
num spend {g in GROUPS, SPENDING[g]};
/*read in spending for product group 1 */
read data sales1 into SPENDING[1] = [_N_] spend[1,_N_]=spend1;
/*read in spending for product group 2*/
read data sales2 into SPENDING[2] = [_N_] spend[2,_N_]=spend2;
print spend;
/*declare two threshold variables */
var thresh{GROUPS} >= 0;
/*declare the problem - minimize both exceptions at the same time*/
min p = sum{g in GROUPS, i in SPENDING[g]} sign(spend[g,i] - thresh[g]);
/*constraint - total estimated spending, across ALL products, has to be less than 9*/
con bc: sum{g in GROUPS} card(SPENDING[g]) * thresh[g] <= 9;
solve with nlp / ms;
print thresh; /*this SHOULD be 3 for thresh1 and 0 for thresh2...*/
quit;
But I suspect that the linear approach will scale better.
You could also make the code more data driven (instead of hard-coding to two groups) by merging your input into one data set with two columns: group and spend.
That works! Thank you, once again.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.