Hello,
I have a dataset like the following:
data have;
id=1; x1=10; x2=3; x3=5; y=220;
y1g=12; y2g=6; y3g=10;
output;
id=2; x1=7; x2=0; x3=3; y=100;
y1g=12; y2g=0; y3g=10;
output;
run;
for each record I want to find values for y1,y2,y3
such that
x1*y1+x2*y2+x3*y3=y
0=<y1<=12
0=<y2<=6
0=<y3<=10
I am not sure of which can be the objective function, but to fix the ideas
I might want that the solutions are as similar as possible to some initial guess, y1g,y2g,y3g.
so for instance the objective might be:
min (y1-y1g)**2+(y2-y2g)**2+(y3-y3g)**2
I am using SAS9 M5 and have to apply the program to a dataset of about 100000 records.
I am very new to sas/or but wishing to learn it at least a little.
thank you very much in advance for any help or suggestions
This might help get you started with the syntax.
The first row is infeasible due to the constraint. If all three decision variables are at their upper bounds, 10*y1 + 3*y2 + 5*y3 equals 188, less than the y value of 220. The solution status column in the output data set captures this information for all rows.
proc optmodel;
/* declare an index set */
set <num> ROWS;
/* declare parameters for each row in the set */
num x1{ROWS};
num x2{ROWS};
num x3{ROWS};
num y{ROWS};
num y1g{ROWS};
num y2g{ROWS};
num y3g{ROWS};
/* read the have data set and parameters into OPTMODEL to populate */
read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;
/* iteration scalar to loop through and solve each row */
num i;
/* decision variables for each row */
var y1{ROWS} >= 0 <= 12;
var y2{ROWS} >= 0 <= 6;
var y3{ROWS} >= 0 <= 10;
/* declare and later populate dec variables with optimal values */
num y1sol{ROWS};
num y2sol{ROWS};
num y3sol{ROWS};
/* declare and later populate solution status of the run */
str solstatus {ROWS};
con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];
min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;
/* loop to solve for each row */
do i = 1 to card(ROWS);
solve;
*print y1[i] y2[i] y3[i];
y1sol[i] = y1[i].sol;
y2sol[i] = y2[i].sol;
y3sol[i] = y3[i].sol;
solstatus[i] = _solution_status_;
end;
/* create output data set */
create data want from [id]=ROWS
x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit;
Sounds like you want to solve a least-squares problem, which is formulated as a quadratic objective function subject to a linear constraint and boundary constraints. See the documentation for the quadratic programming solver in PROC OPTMODEL, which contains several examples of how to encode the problem in PROC OPTMODEL.
This might help get you started with the syntax.
The first row is infeasible due to the constraint. If all three decision variables are at their upper bounds, 10*y1 + 3*y2 + 5*y3 equals 188, less than the y value of 220. The solution status column in the output data set captures this information for all rows.
proc optmodel;
/* declare an index set */
set <num> ROWS;
/* declare parameters for each row in the set */
num x1{ROWS};
num x2{ROWS};
num x3{ROWS};
num y{ROWS};
num y1g{ROWS};
num y2g{ROWS};
num y3g{ROWS};
/* read the have data set and parameters into OPTMODEL to populate */
read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;
/* iteration scalar to loop through and solve each row */
num i;
/* decision variables for each row */
var y1{ROWS} >= 0 <= 12;
var y2{ROWS} >= 0 <= 6;
var y3{ROWS} >= 0 <= 10;
/* declare and later populate dec variables with optimal values */
num y1sol{ROWS};
num y2sol{ROWS};
num y3sol{ROWS};
/* declare and later populate solution status of the run */
str solstatus {ROWS};
con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];
min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;
/* loop to solve for each row */
do i = 1 to card(ROWS);
solve;
*print y1[i] y2[i] y3[i];
y1sol[i] = y1[i].sol;
y2sol[i] = y2[i].sol;
y3sol[i] = y3[i].sol;
solstatus[i] = _solution_status_;
end;
/* create output data set */
create data want from [id]=ROWS
x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit;
The original code works correctly but has many unused variables (three times the number of rows when you need only three). You will get much better performance by replacing the three VAR statements as shown below. When calling the solver in a loop, I often use PRINTLEVEL=0 and OPTION NONOTES and also a PUT statement to show the progress, as shown.
proc optmodel printlevel=0;
/* declare an index set */
set <num> ROWS;
/* declare parameters for each row in the set */
num x1{ROWS};
num x2{ROWS};
num x3{ROWS};
num y{ROWS};
num y1g{ROWS};
num y2g{ROWS};
num y3g{ROWS};
/* read the have data set and parameters into OPTMODEL to populate */
read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;
/* iteration scalar to loop through and solve each row */
num i;
/* decision variables for each row */
/* var y1{ROWS} >= 0 <= 12;*/
/* var y2{ROWS} >= 0 <= 6;*/
/* var y3{ROWS} >= 0 <= 10;*/
var y1 >= 0 <= 12;
var y2 >= 0 <= 6;
var y3 >= 0 <= 10;
/* declare and later populate dec variables with optimal values */
num y1sol{ROWS};
num y2sol{ROWS};
num y3sol{ROWS};
/* declare and later populate solution status of the run */
str solstatus {ROWS};
/* con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];*/
con Equality: x1[i]*y1 + x2[i]*y2 + x3[i]*y3 = y[i];
/* min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;*/
min Obj = (y1-y1g[i])**2+(y2-y2g[i])**2+(y3-y3g[i])**2;
/* loop to solve for each row */
option nonotes;
do i = 1 to card(ROWS);
if mod(i,1000) = 0 then put i=;
solve;
*print y1[i] y2[i] y3[i];
/* y1sol[i] = y1[i].sol;*/
/* y2sol[i] = y2[i].sol;*/
/* y3sol[i] = y3[i].sol;*/
y1sol[i] = y1.sol;
y2sol[i] = y2.sol;
y3sol[i] = y3.sol;
solstatus[i] = _solution_status_;
end;
option notes;
/* create output data set */
create data want from [id]=ROWS
x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit;
And in SAS Viya, you can use the runOptmodel action with BY-group processing to solve the independent instances concurrently, without having to write any explicit loops:
data sascas1.have;
set have;
run;
proc cas;
action setsessopt / messageLevel="warning";
run;
source pgm;
/* declare parameters for each row in the set */
num x1;
num x2;
num x3;
num y;
num y1g;
num y2g;
num y3g;
/* read the have data set and parameters into OPTMODEL to populate */
read data have into x1 x2 x3 y y1g y2g y3g;
/* decision variables for each row */
var y1 >= 0 <= 12;
var y2 >= 0 <= 6;
var y3 >= 0 <= 10;
con Equality: x1*y1 + x2*y2 + x3*y3 = y;
min Obj = (y1-y1g)**2+(y2-y2g)**2+(y3-y3g)**2;
/* solve one row */
solve;
/* create output data set */
create data want from
x1 x2 x3 y y1g y2g y3g y1sol=y1 y2sol=y2 y3sol=y3 solstatus=_solution_status_;
endsource;
action optimization.runOptmodel / code=pgm groupBy={"id"} printlevel=0;
run;
quit;
Thank you very much Rob. the code on the entire dataset run in only about 10 minutes!
The biggest improvement is probably the reduction of the number of variables, However a role is likely to be played also by printlevel=0.
I realized that without it Sas kept on printing on the html output making it grow continuously and congesting the system.
The strange thing is that I had on my code: ods html close; and ods html exclude all. So there is something that I misunderstand about these statements.
However with printlevel=0, sas stopped to print the output.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.