BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ciro
Quartz | Level 8

Hello,

I have a dataset like the following:

data have;
id=1; x1=10; x2=3; x3=5; y=220;
y1g=12; y2g=6; y3g=10;
output;
id=2; x1=7; x2=0; x3=3; y=100;
y1g=12; y2g=0; y3g=10;
output;
run;

 

for each record I want to find values for y1,y2,y3

such that

x1*y1+x2*y2+x3*y3=y

0=<y1<=12

0=<y2<=6

0=<y3<=10

 

I am not sure of which can be the objective function, but to fix the ideas

I might want that the solutions are as similar as possible to some initial guess, y1g,y2g,y3g.

so for instance the objective might be:

min (y1-y1g)**2+(y2-y2g)**2+(y3-y3g)**2

 

I am using SAS9 M5 and have to apply the program to a dataset of about 100000 records.

I am very new to sas/or but wishing to learn it at least a little.

 

thank you very much in advance for any help or suggestions

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChanceTGardener
SAS Employee

This might help get you started with the syntax. 

 

The first row is infeasible due to the constraint. If all three decision variables are at their upper bounds, 10*y1 + 3*y2 + 5*y3 equals 188, less than the y value of 220. The solution status column in the output data set captures this information for all rows. 

 

proc optmodel;
/* declare an index set */
set <num> ROWS;

/* declare parameters for each row in the set */
num x1{ROWS};
num x2{ROWS};
num x3{ROWS};
num y{ROWS};
num y1g{ROWS};
num y2g{ROWS};
num y3g{ROWS};

/* read the have data set and parameters into OPTMODEL to populate */
read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;

/* iteration scalar to loop through and solve each row */
num i;

/* decision variables for each row */
var y1{ROWS} >= 0 <= 12;
var y2{ROWS} >= 0 <= 6;
var y3{ROWS} >= 0 <= 10;

/* declare and later populate dec variables with optimal values */
num y1sol{ROWS};
num y2sol{ROWS};
num y3sol{ROWS};

/* declare and later populate solution status of the run */
str solstatus {ROWS};

con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];

min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;

/* loop to solve for each row */
do i = 1 to card(ROWS);
 solve;
  *print y1[i] y2[i] y3[i];
   y1sol[i] = y1[i].sol;
   y2sol[i] = y2[i].sol;
   y3sol[i] = y3[i].sol;
   solstatus[i] = _solution_status_;
end;

/* create output data set */
create data want from [id]=ROWS 
   x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit; 

View solution in original post

8 REPLIES 8
Rick_SAS
SAS Super FREQ

Sounds like you want to solve a least-squares problem, which is formulated as a quadratic objective function subject to a linear constraint and boundary constraints. See the documentation for the quadratic programming solver in PROC OPTMODEL, which contains several examples of how to encode the problem in PROC OPTMODEL.

 

 

ciro
Quartz | Level 8
Thank you very much Rick.
I tried to learn by my self, but a bit in a hurry.
I would appreciate if you could suggest me the code.
thank you in advance
ChanceTGardener
SAS Employee

This might help get you started with the syntax. 

 

The first row is infeasible due to the constraint. If all three decision variables are at their upper bounds, 10*y1 + 3*y2 + 5*y3 equals 188, less than the y value of 220. The solution status column in the output data set captures this information for all rows. 

 

proc optmodel;
/* declare an index set */
set <num> ROWS;

/* declare parameters for each row in the set */
num x1{ROWS};
num x2{ROWS};
num x3{ROWS};
num y{ROWS};
num y1g{ROWS};
num y2g{ROWS};
num y3g{ROWS};

/* read the have data set and parameters into OPTMODEL to populate */
read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;

/* iteration scalar to loop through and solve each row */
num i;

/* decision variables for each row */
var y1{ROWS} >= 0 <= 12;
var y2{ROWS} >= 0 <= 6;
var y3{ROWS} >= 0 <= 10;

/* declare and later populate dec variables with optimal values */
num y1sol{ROWS};
num y2sol{ROWS};
num y3sol{ROWS};

/* declare and later populate solution status of the run */
str solstatus {ROWS};

con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];

min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;

/* loop to solve for each row */
do i = 1 to card(ROWS);
 solve;
  *print y1[i] y2[i] y3[i];
   y1sol[i] = y1[i].sol;
   y2sol[i] = y2[i].sol;
   y3sol[i] = y3[i].sol;
   solstatus[i] = _solution_status_;
end;

/* create output data set */
create data want from [id]=ROWS 
   x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit; 
ciro
Quartz | Level 8
Thank you very much ChanceTGardener.
I appreciated it a lot and especially the comments that helps me to understand the code.
Ciro
ciro
Quartz | Level 8
I have a problem when applying the procedure to my dataset of about 100k observations.
when I tried with a subset of 10k observations it took about 6 minutes.
so I imagined that 1-2 hours was a sufficient time to process the whole dataset.
Instead after a night the procedure was not finished.
I closed all output destination and routed the log with proc printto.
I also tried to suppress the the notes to log, but with no success.
Any idea of how can I complete the task? Should I partition the problem, dividing the dataset, or is there a smarter solution?
Thank you very much in advance
RobPratt
SAS Super FREQ

The original code works correctly but has many unused variables (three times the number of rows when you need only three).  You will get much better performance by replacing the three VAR statements as shown below.  When calling the solver in a loop, I often use PRINTLEVEL=0 and OPTION NONOTES and also a PUT statement to show the progress, as shown.

 

proc optmodel printlevel=0;
   /* declare an index set */
   set <num> ROWS;

   /* declare parameters for each row in the set */
   num x1{ROWS};
   num x2{ROWS};
   num x3{ROWS};
   num y{ROWS};
   num y1g{ROWS};
   num y2g{ROWS};
   num y3g{ROWS};

   /* read the have data set and parameters into OPTMODEL to populate */
   read data have into ROWS=[id] x1 x2 x3 y y1g y2g y3g;

   /* iteration scalar to loop through and solve each row */
   num i;

   /* decision variables for each row */
/*   var y1{ROWS} >= 0 <= 12;*/
/*   var y2{ROWS} >= 0 <= 6;*/
/*   var y3{ROWS} >= 0 <= 10;*/
   var y1 >= 0 <= 12;
   var y2 >= 0 <= 6;
   var y3 >= 0 <= 10;

   /* declare and later populate dec variables with optimal values */
   num y1sol{ROWS};
   num y2sol{ROWS};
   num y3sol{ROWS};

   /* declare and later populate solution status of the run */
   str solstatus {ROWS};

/*   con Equality: x1[i]*y1[i] + x2[i]*y2[i] + x3[i]*y3[i] = y[i];*/
   con Equality: x1[i]*y1 + x2[i]*y2 + x3[i]*y3 = y[i];

/*   min Obj = (y1[i]-y1g[i])**2+(y2[i]-y2g[i])**2+(y3[i]-y3g[i])**2;*/
   min Obj = (y1-y1g[i])**2+(y2-y2g[i])**2+(y3-y3g[i])**2;

   /* loop to solve for each row */
   option nonotes;
   do i = 1 to card(ROWS);
      if mod(i,1000) = 0 then put i=;
      solve;
      *print y1[i] y2[i] y3[i];
/*      y1sol[i] = y1[i].sol;*/
/*      y2sol[i] = y2[i].sol;*/
/*      y3sol[i] = y3[i].sol;*/
      y1sol[i] = y1.sol;
      y2sol[i] = y2.sol;
      y3sol[i] = y3.sol;
      solstatus[i] = _solution_status_;
   end;
   option notes;

   /* create output data set */
   create data want from [id]=ROWS 
      x1 x2 x3 y y1g y2g y3g y1sol y2sol y3sol solstatus;
quit;
RobPratt
SAS Super FREQ

And in SAS Viya, you can use the runOptmodel action with BY-group processing to solve the independent instances concurrently, without having to write any explicit loops:

data sascas1.have;
   set have;
run;

proc cas;
   action setsessopt / messageLevel="warning";
   run;

   source pgm;
   /* declare parameters for each row in the set */
   num x1;
   num x2;
   num x3;
   num y;
   num y1g;
   num y2g;
   num y3g;

   /* read the have data set and parameters into OPTMODEL to populate */
   read data have into x1 x2 x3 y y1g y2g y3g;

   /* decision variables for each row */
   var y1 >= 0 <= 12;
   var y2 >= 0 <= 6;
   var y3 >= 0 <= 10;

   con Equality: x1*y1 + x2*y2 + x3*y3 = y;

   min Obj = (y1-y1g)**2+(y2-y2g)**2+(y3-y3g)**2;

   /* solve one row */
   solve;

   /* create output data set */
   create data want from 
      x1 x2 x3 y y1g y2g y3g y1sol=y1 y2sol=y2 y3sol=y3 solstatus=_solution_status_;
   endsource;
   action optimization.runOptmodel / code=pgm groupBy={"id"} printlevel=0;
   run;
quit;
ciro
Quartz | Level 8

Thank you very much Rob. the code on the entire dataset run in only about 10 minutes!

The biggest improvement is probably the reduction of the number of variables, However a role is likely to be played also by printlevel=0.

I realized that without it Sas kept on printing on the html output making it grow continuously and congesting the system.

The strange thing is that I had on my code: ods html close; and ods html exclude all. So there is something that I misunderstand about these statements.

However with printlevel=0, sas stopped to print the output.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1194 views
  • 8 likes
  • 4 in conversation