Hello community,
I am starting using Proc Opt model, and I would like to minimize a function like this one:
min|y|=B1*type + B2*b + B3*c + B4*d + B5*e + B6
Where values are taken from a dataset, here the values:
Y should be an integer number and this is the distribution
B1, B2, B3, B4, B5 and B6 are constants
'type' should be a string with values "white", "black"
Should I have different results for each string?
'b' should be with values 2 and 12
'c' should be in a float value in this distribution
'd' should be in an interval 1 to 48 integer values
'e' should be with values 0, 2 or 12
Thank you in advance
@RobPratt @Ksharp
Yes but I would like to use the read data after I imported my xlsx, is that possible?
I don't quite understand your problem. Do you want to solve a regression problem where b, c, d, and e are regression coefficients? If B1 is a constant and type is a string, what does B1*type mean? Are you able to share example data and also an example (not necessarily optimal) solution?
Yes,
For example, I would like to minimize y in this dataset. I was planning to set a lower limit of y to 1 because I don’t want it to be zero, and I also need to include constraints that are not explicitly visible in my data. For instance, the values in column 'a' should be within the integer range from 6 to 48.
Could you provide an example of how to minimize a regression function for y using this dataset or by applying the constraints I want, instead of using one of my columns?
Type is an example of a string column where I should perform minimization for each string value, but for now, we can ignore it to simplify the problem.
The following PROC OPTMODEL code solves an ordinary least squares regression:
proc optmodel;
/* declare sets and input parameters */
set OBS;
num y {OBS};
num a {OBS};
num b {OBS};
num c {OBS};
num d {OBS};
num e {OBS};
/* read input data set */
read data input into OBS=[_N_] y a b c d e;
/* declare decision variables */
var Beta {0..5};
var Error {OBS};
impvar Estimate {i in OBS} = Beta[0] + Beta[1]*a[i] + Beta[2]*b[i] + Beta[3]*c[i] + Beta[4]*d[i] + Beta[5]*e[i];
/* declare objective to minimize sum of squared errors */
min SSE = sum {i in OBS} Error[i]^2;
/* declare constraints */
con ErrorCon {i in OBS}:
Error[i] = Estimate[i] - y[i];
/* call quadratic programming (QP) solver */
solve;
/* print regression coefficients */
print Beta;
/* create output data sets */
create data BetaData from [j] Beta;
create data EstimateData from [i] a b c d e y Estimate Error;
quit;
The resulting regression coefficients are:
[1] | Beta |
---|---|
0 | 416.9191 |
1 | 19.3854 |
2 | 18.5124 |
3 | -13.0827 |
4 | -22.7077 |
5 | -1.4499 |
proc reg data=input;
model y = a b c d e;
quit;
I don't want to find the betas, I already have the beta values. I would like to find the a, b, c , d , e values in an interval from my dataset or an interval that I will determine to minimize the y value. It's just an equation where my betas can be positive or negatives.
If I understood what you mean.
proc optmodel;
num a{1..2}=[2 12];
num b{1..3}=[2 12 24];
num d{i in 1..10}=i;
num e{i in 1..47}=i;
var var_a{1..2} binary;
var var_b{1..3} binary;
var var_c >=5 <=15;
var var_d{1..10} binary;
var var_e{1..47} binary;
impvar sum=-8*sum{i in 1..2} a[i]*var_a[i] +
-2*sum{i in 1..3} b[i]*var_b[i] +
-2*var_c +
-4*sum{i in 1..10}d[i]*var_d[i] +
2*sum{i in 1..47}e[i]*var_e[i] ;
min obj=sum;
con con_total:sum>=1;
con con_a:sum{i in 1..2} var_a[i]=1;
con con_b:sum{i in 1..3} var_b[i]=1;
con con_d:sum{i in 1..10}var_d[i]=1;
con con_e:sum{i in 1..47}var_e[i]=1;
solve ;
print a var_a ;
print b var_b ;
print var_c ;
print d var_d ;
print e var_e ;
quit;
Thanks for posting this reply. +1
A slight improvement is that for variables d and e with consecutive integer domains, you can avoid num d, num e, con_d, and con_e by using integer variables:
var var_d >= 1 <= 10 integer;
var var_e >= 1 <= 47 integer;
Thank you, I will try by today and tomorrow if it is work. If I would like to use steps of 4 within the interval? How can i change it?
Here's one way to enforce that x is in {2,6,10,...,42}:
var x;
set MYSET = 2..42 by 4;
var y {MYSET} binary;
con Mycon1: sum {i in MYSET} y[i] = 1;
con Mycon2: sum {i in MYSET} i*y[i] = x;
Here's another way that uses fewer variables and constraints:
var x >= 2 <= 42;
var y integer;
con Mycon: x = 2 + 4*y;
See also https://or.stackexchange.com/questions/6545/how-to-linearize-membership-in-a-finite-set
How can I integrate that in the code of the proc optmodel above? @RobPratt
How can I insert these con
var x >= 2 <= 42;
var y integer;
con Mycon: x = 2 + 4*y;
to this proc opt model
proc optmodel;
num a{1..2}=[2 12];
num b{1..3}=[2 12 24];
num d{i in 1..10}=i;
num e{i in 1..47}=i;
var var_a{1..2} binary;
var var_b{1..3} binary;
var var_c >=5 <=15;
var var_d{1..10} binary;
var var_e{1..47} binary;
impvar sum=-8*sum{i in 1..2} a[i]*var_a[i] +
-2*sum{i in 1..3} b[i]*var_b[i] +
-2*var_c +
-4*sum{i in 1..10}d[i]*var_d[i] +
2*sum{i in 1..47}e[i]*var_e[i] ;
min obj=sum;
con con_total:sum>=1;
con con_a:sum{i in 1..2} var_a[i]=1;
con con_b:sum{i in 1..3} var_b[i]=1;
con con_d:sum{i in 1..10}var_d[i]=1;
con con_e:sum{i in 1..47}var_e[i]=1;
solve ;
print a var_a ;
print b var_b ;
print var_c ;
print d var_d ;
print e var_e ;
quit;
proc optmodel; num a{1..2}=[2 12]; num b{1..3}=[2 12 24]; num d{i in 1..10}=i; num e{i in 1..47}=i; var var_a{1..2} binary; var var_b{1..3} binary; var var_c >=2 <=42 ; var var_d integer; impvar sum= 4*sum{i in 1..2} a[i]*var_a[i] + -2*sum{i in 1..3} b[i]*var_b[i] + -2*var_c + 4*var_d ; min obj=sum; con con_total:sum>=1; con con_a:sum{i in 1..2} var_a[i]=1; con con_b:sum{i in 1..3} var_b[i]=1; con con_d:var_c=2+4*var_d; solve ; print a var_a ; print b var_b ; print var_c ; print var_d ; quit;
If I would like to read the data from my dataset?
I don't understand. If you want to import these interval of variable from XLSX, you could make a macro variable (assuming the levels of these variable was not too many).
proc import datafile="C:\Users\xiakeshan\Downloads\input.xlsx" out=have dbms=xlsx replace;
run;
proc sql noprint;
select distinct a into :a separated by ' ' from have;
select count(distinct a) into :_a separated by ' ' from have;
select distinct b into :b separated by ' ' from have;
select count(distinct b) into :_b separated by ' ' from have;
select floor(min(c)) into :min_c from have;
select ceil(max(c)) into :max_c from have;
select distinct d into :d separated by ' ' from have;
select count(distinct d) into :_d separated by ' ' from have;
select distinct e into :e separated by ' ' from have;
select count(distinct e) into :_e separated by ' ' from have;
quit;
proc optmodel;
num a{1..&_a.}=[&a.];
num b{1..&_b.}=[&b.];
num d{1..&_d.}=[&d.];
num e{1..&_e.}=[&e.];
var var_a{1..&_a.} binary;
var var_b{1..&_b.} binary;
var var_c >=&min_c. <=&max_c.;
var var_d{1..&_d.} binary;
var var_e{1..&_e.} binary;
impvar sum=-8*sum{i in 1..&_a.} a[i]*var_a[i] +
-2*sum{i in 1..&_b.} b[i]*var_b[i] +
-2*var_c +
-4*sum{i in 1..&_d.}d[i]*var_d[i] +
2*sum{i in 1..&_e.}e[i]*var_e[i] ;
min obj=sum;
con con_total:sum>=1;
con con_a:sum{i in 1..&_a.} var_a[i]=1;
con con_b:sum{i in 1..&_b.} var_b[i]=1;
con con_d:sum{i in 1..&_d.} var_d[i]=1;
con con_e:sum{i in 1..&_e.} var_e[i]=1;
solve ;
print a var_a ;
print b var_b ;
print var_c ;
print d var_d ;
print e var_e ;
quit;