Minimizing the standard deviation between the sum of different variabl...

hacamargoa · Posted 06-27-2018 12:44 AM

Hello, I am starting to work with proc optmodel and have a recurrent problem easy for solver but I want to do it in SAS. I have the daily temperatures during one month for 4 years and I need to create new binary variables CU1--CU4 (one for each year) defined as 1 if the temperature is in a specific range. However, I need to find the limits of that range that minimize the standard deviation between the four sum of CUs, I was trying something that probably is totally wrong because I do not know how to define the limits of the range and also if I have to put some constrains or impvar.

data temp; 
input Log y2013 y2014 y2015 y2016; 
cards; 
1 28.2 28.3 18.2 27.8 
2 29.6 29.7 19.9 29.2 . . . 

proc optmodel; 
set temp; 
num y2013{temp}; 
num y2014{temp}; 
num y2015{temp}; 
num y2016{temp}; 
var a,b; 
impvar CU1{p in temp}=if a<= y2013[p] <= b then 1 else 0; 
impvar CU2{p in temp}=if a<= y2014[p] <= b then 1 else 0;
impvar CU3{p in temp}=if a<= y2015[p] <= b then 1 else 0; 
impvar CU4{p in temp}=if a<= y2016[p] <= b then 1 else 0; 
impvar S1=sum{p in temp} CU1[p]; impvar S2=sum{p in temp} CU2[p]; impvar S3=sum{p in temp} CU3[p]; impvar S4=sum{p in temp} CU4[p]; 
min SSE=sqrt(.25 * ((S1-(S1+S2+S3+S4)/4)**2+(S2-(S1+S2+S3+S4)/4)**2+(S3-(S1+S2+S3+S4)/4)**2+(S4-(S1+S2+S3+S4)/4)**2)); 
solve; 
print a b; 
quit; 
run;

RobPratt · Posted 06-27-2018 04:39 PM

Here is a more compact way to write your code:

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a,b;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0); /* even shorter: (a <= temp[p,y] <= b) */
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   min SSE = sum{y in YEARS}(S[y]-mean)**2;
   solve;
   /* display solution */
   print a b;
   print CU S SSE;
quit;

But the optimal objective value is trivially 0. For example, if you take a to be the smallest data value and b to be the largest data value, then CU[p,y] is always 1, S[y] is always card(DAYS), and SSE = 0. Another optimal solution is to take a = b = an arbitrary value that does not appear in your data, say -999. In that case, CU[p,y] = 0, S[y] = 0, and again SSE = 0.

hacamargoa · Posted 06-28-2018 10:43 PM

Thanks for your help. However, I think something is still wrong, the point is that the length of the variable is not the same every year so if I use a as the smallest data value and b as the largest data value, the variation won't be 0. Anyways, I am now minimizing the coefficient of variation to avoid trivial solutions but I wanted to constrain that the mean is different to 0 and that b is greater than a but I get this response "The problem contains strict inequality or predicate constraints that reference non-integer variables".

Then I tried to approximately address the values of a and b as shown in the script (a GT 3 and b GT 16, in this case) but the response is just showing me the same values with some decimals according to 0 iterations.

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a>=3,b>=16;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   minimize Coefv = (sqrt(sum{y in YEARS}((S[y]-mean)**2)/card(YEARS)))/mean;
   solve;
   /* display solution */
   print a b;
   print S mean coefv;
quit;

RobPratt · Posted 06-29-2018 02:03 PM

You cannot use strict inequalities with the NLP solver. But you could do this instead to force a positive mean:

   con PositiveMean: mean >= 1e-6;

Similarly, you can force A < B like this:

   con A_LT_B: A + 1e-6 <= B;

With or without these constraints, your new objective does not avoid all trivial solutions. For example, the following code yields an optimal solution with mean = 2 and objective value = 0:

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a,b;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   minimize CoefvSquared = (sum{y in YEARS}((S[y]-mean)**2)/card(YEARS))/mean^2;
   a = min {p in DAYS, y in YEARS} temp[p,y];
   b = max {p in DAYS, y in YEARS} temp[p,y];
   solve;
   /* display solution */
   print a b;
   print S mean CoefvSquared;
quit;

Note that I squared the objective to avoid the SQRT function, which is nondifferentiable at 0. Also, the a and b values supplied as a starting solution are computed as min and max, respectively, across all observations.

Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Re: Minimizing the standard deviation between the sum of different variables

Registration is open