BookmarkSubscribeRSS Feed
hacamargoa
Calcite | Level 5

Hello, I am starting to work with proc optmodel and have a recurrent problem easy for solver but I want to do it in SAS. I have the daily temperatures during one month for 4 years and I need to create new binary variables CU1--CU4 (one for each year) defined as 1 if the temperature is in a specific range. However, I need to find the limits of that range that minimize the standard deviation between the four sum of CUs, I was trying something that probably is totally wrong because I do not know how to define the limits of the range and also if I have to put some constrains or impvar. 

data temp; 
input Log y2013 y2014 y2015 y2016; 
cards; 
1 28.2 28.3 18.2 27.8 
2 29.6 29.7 19.9 29.2 . . . 

proc optmodel; 
set temp; 
num y2013{temp}; 
num y2014{temp}; 
num y2015{temp}; 
num y2016{temp}; 
var a,b; 
impvar CU1{p in temp}=if a<= y2013[p] <= b then 1 else 0; 
impvar CU2{p in temp}=if a<= y2014[p] <= b then 1 else 0;
impvar CU3{p in temp}=if a<= y2015[p] <= b then 1 else 0; 
impvar CU4{p in temp}=if a<= y2016[p] <= b then 1 else 0; 
impvar S1=sum{p in temp} CU1[p]; impvar S2=sum{p in temp} CU2[p]; impvar S3=sum{p in temp} CU3[p]; impvar S4=sum{p in temp} CU4[p]; 
min SSE=sqrt(.25 * ((S1-(S1+S2+S3+S4)/4)**2+(S2-(S1+S2+S3+S4)/4)**2+(S3-(S1+S2+S3+S4)/4)**2+(S4-(S1+S2+S3+S4)/4)**2)); 
solve; 
print a b; 
quit; 
run;
3 REPLIES 3
RobPratt
SAS Super FREQ

Here is a more compact way to write your code:

 

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a,b;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0); /* even shorter: (a <= temp[p,y] <= b) */
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   min SSE = sum{y in YEARS}(S[y]-mean)**2;
   solve;
   /* display solution */
   print a b;
   print CU S SSE;
quit;

 

 

But the optimal objective value is trivially 0.  For example, if you take a to be the smallest data value and b to be the largest data value, then CU[p,y] is always 1, S[y] is always card(DAYS), and SSE = 0.  Another optimal solution is to take a = b = an arbitrary value that does not appear in your data, say -999.  In that case, CU[p,y] = 0, S[y] = 0, and again SSE = 0.

hacamargoa
Calcite | Level 5

Thanks for your help. However, I think something is still wrong, the point is that the length of the variable is not the same every year so if I use a as the smallest data value and b as the largest data value, the variation won't be 0. Anyways, I am now minimizing the coefficient of variation to avoid trivial solutions but I wanted to constrain that the mean is different to 0 and that b is greater than a but I get this response "The problem contains strict inequality or predicate constraints that reference non-integer variables". 

Then I tried to approximately address the values of a and b as shown in the script (a GT 3 and b GT 16, in this case) but the response is just showing me the same values with some decimals according to 0 iterations.

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a>=3,b>=16;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   minimize Coefv = (sqrt(sum{y in YEARS}((S[y]-mean)**2)/card(YEARS)))/mean;
   solve;
   /* display solution */
   print a b;
   print S mean coefv;
quit;
RobPratt
SAS Super FREQ

You cannot use strict inequalities with the NLP solver.  But you could do this instead to force a positive mean:

   con PositiveMean: mean >= 1e-6;

Similarly, you can force A < B like this:

   con A_LT_B: A + 1e-6 <= B;

With or without these constraints, your new objective does not avoid all trivial solutions.  For example, the following code yields an optimal solution with mean = 2 and objective value = 0:

proc optmodel;
   set YEARS = 2013..2016;
   set DAYS;
   num temp{DAYS, YEARS};
   read data temp into DAYS=[Log] {y in YEARS} <temp[Log,y]=col('y'||y)>;
   var a,b;
   impvar CU{p in DAYS, y in YEARS} = (if a <= temp[p,y] <= b then 1 else 0);
   impvar S{y in YEARS} = sum{p in DAYS} CU[p,y];
   impvar mean = (sum{y in YEARS} S[y])/card(YEARS);
   minimize CoefvSquared = (sum{y in YEARS}((S[y]-mean)**2)/card(YEARS))/mean^2;
   a = min {p in DAYS, y in YEARS} temp[p,y];
   b = max {p in DAYS, y in YEARS} temp[p,y];
   solve;
   /* display solution */
   print a b;
   print S mean CoefvSquared;
quit;

Note that I squared the objective to avoid the SQRT function, which is nondifferentiable at 0.  Also, the a and b values supplied as a starting solution are computed as min and max, respectively, across all observations.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Discussion stats
  • 3 replies
  • 1618 views
  • 0 likes
  • 2 in conversation