Solved: Re: proc optmodel by-group / groups not independent

Alain38 · Posted 09-20-2019 02:50 AM

Hello,

I noticed something weird: let say I run a proc optmodel for group 1 to 100, and that I obtain OPTIMAL solutions for all, except BEST_FEASIBLE for groups 5 and 25.

Now if I launch again the exact same proc optmodel with only the groups 5 and 25, I obtain OPTIMAL results for them!

Sometimes it's even more strange since I will obtain for example an OPTIMAL result for group 5, only if I run it after let say group 2...

So to obtain optimal results for every group, I need to relaunch the optimization process for a subset of groups, sometimes several times, with trial and error.

Why the result of a by-group is sometimes dependent on the result of the previous by-group? What options could solve this?

josgri · Posted 09-20-2019 02:07 PM

Short answer: you can get the optimal solution by adding soltype=0. By default it is 1. Bestfeasible status only exists if an optimal solution has been found but a point was found earlier that is also feasible for the provided tolerance but its dual-optimality measure was not yet within tolerance.

Long answer why:

Bestfeasible versus optimal is a hard one to describe but was highly requested by users as the default. Basically this option lives in the gap between analytic and numerical optimization.

Suppose we minimize f(x) subject to c(x) = 0.

Let g(x) be the gradient of f(x) and J(x) the Jacobian of c(x).

The algorithm iteratively creates a list of points {(x1,y1), (x2, y2) , ... (xk, yk)}. Loosely speaking we stop adding points to this list when

c(xk) = 0
g(xk) - J(xk)'*yk = 0

(at least to user provided tolerances) and declare optimal. Together these conditions are called the first-order optimality conditions. In exact arithmetic on convex problems we should have that

f(xk) < min_{1<= j < k} {f(xj) : c(xj) = 0}

for all feasible points encountered in this list. I.e. the last point has the best objective. However, in numerical optimization it is not uncommon to find a point f(xj) with j < k where

f(xj) < f(xk)
c(xj) = 0
but g(xj) - J(xj)'*yj is not yet 0

I.e. we could have stopped at xj if a yj was known that satisfied the dual optimality condition (2) above as well as condition (1).

So which would the user want?

(a) A feasible point that has a better objective but non-optimal dual multipliers

(b) A fully optimal point whose objective is worse.

I understand that in analytic convex settings this makes no sense and seems like it shouldn't happen. But in the numerical nonlinear world, it is pretty easy to create small examples where this happens with arbitrarily large gaps between f(xk) and f(xj). And it is one of those cases where any fix has the property that the cure will be worse than the illness. I.e. for a given problem you can fix it at the cost of making other problems that work wonderfully, break. Not sure if that makes sense? It is a bit hard to describe without going into pdf/latex.

View solution in original post

Rick_SAS · Posted 09-20-2019 07:50 AM

You don't include your program, but something like this might occur if you are letting the program choose initial guesses randomly or if there is a random component to the solver. The random number stream is presumably reused between BY groups, which means that the set of random numbers used to solve the problem depends on the number of BY groups, the number of iterations within each BY-group, etc.

This is just a guess. Post your program if you want a more informed opinion.

RobPratt · Posted 09-20-2019 08:42 AM

By default, each NLP solver call uses the current variable values as a starting solution. In your case, these values come from the previous solve. To avoid this behavior, use the NOINITVAR option in the PROC OPTMODEL statement or explicitly set the values before each solve.

Alain38 · Posted 09-20-2019 09:21 AM

Thank you very much for your explanations! Here is a small working example:

data MVC;
      input  grp _1 _2 _3 _4 _5 _6 name $; 
   datalines;
2	0.001426101	0.001373928	0.001499921	0.000853808	0.000732604	0.001131657	_1
2	0.001373928	0.001491575	0.001604228	0.00094015	0.000836575	0.001257868	_2
2	0.001499921	0.001604228	0.001800986	0.000989169	0.000893931	0.001403105	_3
2	0.000853808	0.00094015	0.000989169	0.00082856	0.000629721	0.000877513	_4
2	0.000732604	0.000836575	0.000893931	0.000629721	0.000649809	0.00078795	_5
2	0.001131657	0.001257868	0.001403105	0.000877513	0.00078795	0.001276228	_6
3	0.001408027	0.001347115	0.001451531	0.000833388	0.000725207	0.001087888	_1
3	0.001347115	0.001457885	0.001550696	0.000912191	0.000823088	0.001209183	_2
3	0.001451531	0.001550696	0.001726184	0.000941497	0.000866587	0.00133465	_3
3	0.000833388	0.000912191	0.000941497	0.000806385	0.000620399	0.000834292	_4
3	0.000725207	0.000823088	0.000866587	0.000620399	0.00064782	0.000763326	_5
3	0.001087888	0.001209183	0.00133465	0.000834292	0.000763326	0.001213602	_6
4	0.001429224	0.001353777	0.001464237	0.000843381	0.000726821	0.001086287	_1
4	0.001353777	0.001456634	0.00155055	0.000915175	0.000820362	0.001200883	_2
4	0.001464237	0.00155055	0.001728677	0.000947293	0.000863554	0.00132404	_3
4	0.000843381	0.000915175	0.000947293	0.000811089	0.000621009	0.000833171	_4
4	0.000726821	0.000820362	0.000863554	0.000621009	0.000644818	0.000755667	_5
4	0.001086287	0.001200883	0.00132404	0.000833171	0.000755667	0.001195544	_6
;

data Predicted;
      input  grp _1 _2 _3 _4 _5 _6; 
   datalines;
2	0.04341	0.03845	0.05235	0.0088	0.02185	0.03547
3	0.04658	0.04025	0.04254	0.00115	0.01937	0.03196
4	0.02918	0.03295	0.03577	0.00836	0.0213	0.03342
;

data Targetvar;
      input grp Target_var; 
   datalines;
2	0.000848383
3	0.000867194
4	0.000864978
;

%let byvar = grp;
proc optmodel printlevel=0 NOINITVAR;
	set OBS;
	set GROUPS;
	set <str> ASSETS;			
	set <num,str> GROUPS_ASSETS;

	num grp{OBS};
	num MVC{GROUPS_ASSETS, ASSETS};					
	num E{GROUPS_ASSETS};
	num Target_var{GROUPS};

	read data MVC into ASSETS=[Name];
	read data MVC into OBS=[_N_] grp;
	read data MVC nomiss into GROUPS_ASSETS=[k=grp i=Name] {j in ASSETS} <MVC[k,i,j]=col(j)>;
	read data Predicted nomiss into [k=grp] {j in ASSETS} <E[k,j]=col(j)>;
	read data Targetvar into GROUPS=[grp] Target_var;

	set BYSET = setof {i in OBS} &byvar.[i];
	num by;

	set OBS_BY = {i in OBS: &byvar.[i] = by};
	set Assets_BY = setof {o in OBS_BY, <(grp[o]),a> in GROUPS_ASSETS} a;
	var W{Assets} >= 0;

	maximize Expected = sum{i in Assets_BY}W[i]*E[by,i];
	con sum{i in Assets_BY, j in Assets_BY}W[i]*MVC[by,i,j]*W[j] = Target_var[by];
	con BUDGET: sum{i in Assets_BY}W[i] = 1;

	num W_sol {GROUPS_ASSETS};
	str solstatus {BYSET};
	do by = BYSET;
		put by=;
		solve with NLP / solver=ACTIVESET;
		for {i in Assets_BY} W_sol[by,i] = W[i].sol;
		solstatus[by] = _solution_status_;
	end;

	create data Weights from [&byvar i] W=W_sol;
	create data Solstatusdata from [&byvar] solstatus;
quit;

grp3 is OPTIMAL with NOINITVAR and BEST_FEASIBLE without this option, but for grp4 that's the contrary: OPTIMAL when uses previous values and BEST_FEASIBLE when independent from previous optimization.

So I can launch it twice, once with NOINITVAR and once without, and keep the best solution but that's far from being optimal since I have several hundreds of hours of optimization process to run. I also run it with multiple solvers, and different options for FEASTOL, which leads to thousands of hours of optimization... (computer working 24/7 on this)

Do you see a less time-consuming solution?

RobPratt · Posted 09-20-2019 10:48 AM

I have four suggestions.

1. The variable W should be indexed by Assets_BY:

/*	var W{Assets} >= 0;*/
	var W{Assets_BY} >= 0;

Or if Assets_BY will always be the same as Assets, just use Assets everywhere.

2. Use the MULTISTART option in the SOLVE WITH NLP statement. On my machine, this yielded OPTIMAL for all three groups.

3. Explicitly initialize the variables inside the loop, just before the solve. For example, the following statements distribute the budget uniformly for the initial solution:

do by = BYSET;
   put by=;
   for {i in Assets_BY} W[i] = 1/card(Assets_BY);

4. You might see better performance if you relax the nonlinear constraint from equality to inequality:

/*	con sum{i in Assets_BY, j in Assets_BY}W[i]*MVC[by,i,j]*W[j] = Target_var[by];*/
	con sum{i in Assets_BY, j in Assets_BY}W[i]*MVC[by,i,j]*W[j] <= Target_var[by];

Alain38 · Posted 09-20-2019 12:34 PM

Thank you very much for these suggestions! When I use the MULTISTART option, it often fails, and with this example it failed for the 3 groups for me.

For grp = 2:

WARNING: A default random number seed was generated by Multistart. To reproduce the solution of the current
         run, specify SEED=9136841 in a subsequent SOLVE statement.

%let byvar = grp;
proc optmodel printlevel=0;
	set OBS;
	set GROUPS;
	set <str> ASSETS;			
	set <num,str> GROUPS_ASSETS;

	num grp{OBS};
	num MVC{GROUPS_ASSETS, ASSETS};					
	num E{GROUPS_ASSETS};
	num Target_var{GROUPS};

	read data MVC into ASSETS=[Name];
	read data MVC into OBS=[_N_] grp;
	read data MVC nomiss into GROUPS_ASSETS=[k=grp i=Name] {j in ASSETS} <MVC[k,i,j]=col(j)>;
	read data Predicted nomiss into [k=grp] {j in ASSETS} <E[k,j]=col(j)>;
	read data Targetvar into GROUPS=[grp] Target_var;

	set BYSET = setof {i in OBS} &byvar.[i];
	num by;

	set OBS_BY = {i in OBS: &byvar.[i] = by};
	set Assets_BY = setof {o in OBS_BY, <(grp[o]),a> in GROUPS_ASSETS} a;
	var W{Assets_BY} >= 0;

	max Expected = sum{i in Assets_BY}W[i]*E[by,i];
	con sum{i in Assets_BY, j in Assets_BY}W[i]*MVC[by,i,j]*W[j] <= Target_var[by];
	con BUDGET: sum{i in Assets_BY}W[i] = 1;

	num W_sol {GROUPS_ASSETS};
	str solstatus {BYSET};
	do by = BYSET;
		put by=;
		for {i in Assets_BY} W[i] = 1/card(Assets_BY);
		solve with NLP / solver=ACTIVESET MULTISTART;
		for {i in Assets_BY} W_sol[by,i] = W[i].sol;
		solstatus[by] = _solution_status_;
	end;

	create data Weights from [&byvar i] W=W_sol;
	create data Solstatusdata from [&byvar] solstatus;
quit;

For this example, I obtain OPTIMAL for each group when I use FEASTOL=1E-7, but with the full sample I need to use different values for FEASTOL, sometimes MULTISTART=(maxstarts=2) and different solvers (INTERIORPOINT, ACTIVESET and even "NLPC / tech=quanew") to obtain OPTIMAL for each group.

RobPratt · Posted 09-20-2019 01:15 PM

What version of SAS/OR are you running?

Alain38 · Posted 09-20-2019 01:25 PM

9.4TS1M5 with SAS/OR 14.3

josgri · Posted 09-20-2019 02:07 PM

Short answer: you can get the optimal solution by adding soltype=0. By default it is 1. Bestfeasible status only exists if an optimal solution has been found but a point was found earlier that is also feasible for the provided tolerance but its dual-optimality measure was not yet within tolerance.

Long answer why:

Bestfeasible versus optimal is a hard one to describe but was highly requested by users as the default. Basically this option lives in the gap between analytic and numerical optimization.

Suppose we minimize f(x) subject to c(x) = 0.

Let g(x) be the gradient of f(x) and J(x) the Jacobian of c(x).

The algorithm iteratively creates a list of points {(x1,y1), (x2, y2) , ... (xk, yk)}. Loosely speaking we stop adding points to this list when

c(xk) = 0
g(xk) - J(xk)'*yk = 0

(at least to user provided tolerances) and declare optimal. Together these conditions are called the first-order optimality conditions. In exact arithmetic on convex problems we should have that

f(xk) < min_{1<= j < k} {f(xj) : c(xj) = 0}

for all feasible points encountered in this list. I.e. the last point has the best objective. However, in numerical optimization it is not uncommon to find a point f(xj) with j < k where

f(xj) < f(xk)
c(xj) = 0
but g(xj) - J(xj)'*yj is not yet 0

I.e. we could have stopped at xj if a yj was known that satisfied the dual optimality condition (2) above as well as condition (1).

So which would the user want?

(a) A feasible point that has a better objective but non-optimal dual multipliers

(b) A fully optimal point whose objective is worse.

I understand that in analytic convex settings this makes no sense and seems like it shouldn't happen. But in the numerical nonlinear world, it is pretty easy to create small examples where this happens with arbitrarily large gaps between f(xk) and f(xj). And it is one of those cases where any fix has the property that the cure will be worse than the illness. I.e. for a given problem you can fix it at the cost of making other problems that work wonderfully, break. Not sure if that makes sense? It is a bit hard to describe without going into pdf/latex.

Alain38 · Posted 09-20-2019 02:19 PM

Thank you very much for this explanation, I think I got the idea and it is a lot more clearer now.

Alain38 · Posted 09-20-2019 02:29 PM

Could you please tell me how to save in a dataset the optimal objective and the objective of the best feasible solution found?

(like the solution status is saved with "str solstatus {BYSET};", "solstatus[by] = _solution_status_" and "create data Solstatusdata from [&byvar] solstatus;")

It now seems to me more relevant to use these two informations to select the best solution, rather than the solution status.

josgri · Posted 09-20-2019 03:48 PM

Hmm ... that is a great idea. At the time the option was supported we only had the ability to return a single solution. We can now return multiple solutions I believe, but we would need to add support in C to do this. We will for sure consider that option for a future release.

Unfortunately there is no way to return both. One option is conditionally call solve a second time (not wasted work--see below):

num W_sol {GROUPS_ASSETS};
num W_sol2 {GROUPS_ASSETS};
str solstatus {BYSET};
str solstatus2 {BYSET};
do by = BYSET;
    put by=;
    solve with NLP / solver=ACTIVESET;
    for {i in Assets_BY} W_sol[by,i] = W[i].sol;
    solstatus[by] = _solution_status_; 
    if _solution_status_ = 'BEST_FEASIBLE' then do;
         put "Solving a second time";
         solve with NLP / solver=ACTIVESET soltype=0;
         for {i in Assets_BY} W_sol2[by,i] = W[i].sol;
         solstatus2[by] = _solution_status_;
     end;		
end;

It is not necessarily extra unused work if you remove this option: NOINITVAR

In that case you would be refining the best feasible point in an attempt to make it optimal as well. But by setting soltype=0 on the second solve you ensure you do not need a loop. The second solve will only occur if the first is best feasible and will always return the optimal point found.

I will let the team know your idea about returning both solutions at the same time--thanks!

Alain38 · Posted 09-20-2019 04:15 PM

Thank you very much, I like this solution! Looking forward testing it

Ready to join fellow brilliant minds for the SAS Hackathon?