Hello, I am starting to work with proc optmodel and have a recurrent problem easy for solver but I want to do it in SAS. I have the daily temperatures during one month for 4 years and I need to create new binary variables CU1--CU4 (one for each year) defined as 1 if the temperature is in a specific range. However, I need to find the limits of that range that minimize the standard deviation between the four sum of CUs, I was trying something that probably is totally wrong because I do not know how to define the limits of the range and also if I have to put some constrains or impvar.
data temp;
input Log y2013 y2014 y2015 y2016;
cards;
1 28.2 28.3 18.2 27.8
2 29.6 29.7 19.9 29.2 . . .
proc optmodel;
set temp;
num y2013{temp};
num y2014{temp};
num y2015{temp};
num y2016{temp};
var a,b;
impvar CU1{p in temp}=if a<= y2013[p] <= b then 1 else 0;
impvar CU2{p in temp}=if a<= y2014[p] <= b then 1 else 0;
impvar CU3{p in temp}=if a<= y2015[p] <= b then 1 else 0;
impvar CU4{p in temp}=if a<= y2016[p] <= b then 1 else 0;
impvar S1=sum{p in temp} CU1[p]; impvar S2=sum{p in temp} CU2[p]; impvar S3=sum{p in temp} CU3[p]; impvar S4=sum{p in temp} CU4[p];
min SSE=sqrt(.25 * ((S1-(S1+S2+S3+S4)/4)**2+(S2-(S1+S2+S3+S4)/4)**2+(S3-(S1+S2+S3+S4)/4)**2+(S4-(S1+S2+S3+S4)/4)**2));
solve;
print a b;
quit;
run;
Here is a more compact way to write your code:
proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a,b;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0); /* even shorter: (a <= temp[p,y] <= b) */
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
min SSE = sum{y in YEARS}(S[y]-mean)**2;
solve;
/* display solution */
print a b;
print CU S SSE;
quit;
But the optimal objective value is trivially 0. For example, if you take a to be the smallest data value and b to be the largest data value, then CU[p,y] is always 1, S[y] is always card(DAYS), and SSE = 0. Another optimal solution is to take a = b = an arbitrary value that does not appear in your data, say -999. In that case, CU[p,y] = 0, S[y] = 0, and again SSE = 0.
Thanks for your help. However, I think something is still wrong, the point is that the length of the variable is not the same every year so if I use a as the smallest data value and b as the largest data value, the variation won't be 0. Anyways, I am now minimizing the coefficient of variation to avoid trivial solutions but I wanted to constrain that the mean is different to 0 and that b is greater than a but I get this response "The problem contains strict inequality or predicate constraints that reference non-integer variables".
Then I tried to approximately address the values of a and b as shown in the script (a GT 3 and b GT 16, in this case) but the response is just showing me the same values with some decimals according to 0 iterations.
proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a>=3,b>=16;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
minimize Coefv = (sqrt(sum{y in YEARS}((S[y]-mean)**2)/card(YEARS)))/mean;
solve;
/* display solution */
print a b;
print S mean coefv;
quit;
You cannot use strict inequalities with the NLP solver. But you could do this instead to force a positive mean:
con PositiveMean: mean >= 1e-6;
Similarly, you can force A < B like this:
con A_LT_B: A + 1e-6 <= B;
With or without these constraints, your new objective does not avoid all trivial solutions. For example, the following code yields an optimal solution with mean = 2 and objective value = 0:
proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a,b;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
minimize CoefvSquared = (sum{y in YEARS}((S[y]-mean)**2)/card(YEARS))/mean^2;
a = min {p in DAYS, y in YEARS} temp[p,y];
b = max {p in DAYS, y in YEARS} temp[p,y];
solve;
/* display solution */
print a b;
print S mean CoefvSquared;
quit;
Note that I squared the objective to avoid the SQRT function, which is nondifferentiable at 0. Also, the a and b values supplied as a starting solution are computed as min and max, respectively, across all observations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.