Calcite | Level 5

Minimizing the standard deviation between the sum of different variables

Hello, I am starting to work with proc optmodel and have a recurrent problem easy for solver but I want to do it in SAS. I have the daily temperatures during one month for 4 years and I need to create new binary variables CU1--CU4 (one for each year) defined as 1 if the temperature is in a specific range. However, I need to find the limits of that range that minimize the standard deviation between the four sum of CUs, I was trying something that probably is totally wrong because I do not know how to define the limits of the range and also if I have to put some constrains or impvar.

``````data temp;
input Log y2013 y2014 y2015 y2016;
cards;
1 28.2 28.3 18.2 27.8
2 29.6 29.7 19.9 29.2 . . .

proc optmodel;
set temp;
num y2013{temp};
num y2014{temp};
num y2015{temp};
num y2016{temp};
var a,b;
impvar CU1{p in temp}=if a<= y2013[p] <= b then 1 else 0;
impvar CU2{p in temp}=if a<= y2014[p] <= b then 1 else 0;
impvar CU3{p in temp}=if a<= y2015[p] <= b then 1 else 0;
impvar CU4{p in temp}=if a<= y2016[p] <= b then 1 else 0;
impvar S1=sum{p in temp} CU1[p]; impvar S2=sum{p in temp} CU2[p]; impvar S3=sum{p in temp} CU3[p]; impvar S4=sum{p in temp} CU4[p];
min SSE=sqrt(.25 * ((S1-(S1+S2+S3+S4)/4)**2+(S2-(S1+S2+S3+S4)/4)**2+(S3-(S1+S2+S3+S4)/4)**2+(S4-(S1+S2+S3+S4)/4)**2));
solve;
print a b;
quit;
run;``````
3 REPLIES 3
SAS Super FREQ

Re: Minimizing the standard deviation between the sum of different variables

Here is a more compact way to write your code:

``````proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a,b;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0); /* even shorter: (a <= temp[p,y] <= b) */
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
min SSE = sum{y in YEARS}(S[y]-mean)**2;
solve;
/* display solution */
print a b;
print CU S SSE;
quit;
``````

But the optimal objective value is trivially 0.  For example, if you take a to be the smallest data value and b to be the largest data value, then CU[p,y] is always 1, S[y] is always card(DAYS), and SSE = 0.  Another optimal solution is to take a = b = an arbitrary value that does not appear in your data, say -999.  In that case, CU[p,y] = 0, S[y] = 0, and again SSE = 0.

Calcite | Level 5

Re: Minimizing the standard deviation between the sum of different variables

Thanks for your help. However, I think something is still wrong, the point is that the length of the variable is not the same every year so if I use a as the smallest data value and b as the largest data value, the variation won't be 0. Anyways, I am now minimizing the coefficient of variation to avoid trivial solutions but I wanted to constrain that the mean is different to 0 and that b is greater than a but I get this response "The problem contains strict inequality or predicate constraints that reference non-integer variables".

Then I tried to approximately address the values of a and b as shown in the script (a GT 3 and b GT 16, in this case) but the response is just showing me the same values with some decimals according to 0 iterations.

``````proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a>=3,b>=16;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
minimize Coefv = (sqrt(sum{y in YEARS}((S[y]-mean)**2)/card(YEARS)))/mean;
solve;
/* display solution */
print a b;
print S mean coefv;
quit;``````
SAS Super FREQ

Re: Minimizing the standard deviation between the sum of different variables

You cannot use strict inequalities with the NLP solver.  But you could do this instead to force a positive mean:

``````   con PositiveMean: mean >= 1e-6;
``````

Similarly, you can force A < B like this:

``````   con A_LT_B: A + 1e-6 <= B;
``````

With or without these constraints, your new objective does not avoid all trivial solutions.  For example, the following code yields an optimal solution with mean = 2 and objective value = 0:

``````proc optmodel;
set YEARS = 2013..2016;
set DAYS;
num temp{DAYS, YEARS};
read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
var a,b;
impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
minimize CoefvSquared = (sum{y in YEARS}((S[y]-mean)**2)/card(YEARS))/mean^2;
a = min {p in DAYS, y in YEARS} temp[p,y];
b = max {p in DAYS, y in YEARS} temp[p,y];
solve;
/* display solution */
print a b;
print S mean CoefvSquared;
quit;``````

Note that I squared the objective to avoid the SQRT function, which is nondifferentiable at 0.  Also, the a and b values supplied as a starting solution are computed as min and max, respectively, across all observations.

Discussion stats
• 3 replies
• 841 views
• 0 likes
• 2 in conversation