Re: Proc Optmodel use of multiple starting values

srinirangan123 · Posted 12-06-2017 10:34 PM

Dear Friends,

We are estimating a non-linear model that has the following form:

w_ret = (1 - rho*(sigma_T/sigma_F))*w_uepp + ((sigma_T*sqrt(1 - (rho**2)))/sigma_Z)*w_tq + c1*w_mret + c2*w_mcap + c3*w_bm + c4* w_agro + c5*w_op_prof

The model has 8 parameters: sigma_T, sigma_Z, rho, c1, c2, c3, c4, and c5.

A ninth parameter sigma_F is set to equal a constant 1.

There are 7 variables: w_uepp, w_tq, w_mret, w_mcap, w_bm, w_agro, w_op_prof

sigma_T and sigma_Z are variances and hence constrained to be strictly positive. Rho is a correlation and hence constrained

to lie between -1 and +1.

We set up the program as a least squares minimization problem. The code is below and the input file is provided as an attachment. It has 100 observations.

We use the nlp solver under proc optmodel and the sas code is below. Under NLP, we have included the multistart option, bndrange = 50, and an arbitrary and constant seed. Also, we have provided initial values for the 8 parameters.

Our challenge is that each time we run the program, we get different values for the parameter estimates, sometimes wildly different. This is despite not changing anything in the code. Interestingly, the value of the objective function at the minimum remains the same for each run.

We seek your kind help in answering the following questions:

1. How can we ensure that we get the same parameters for each run?

2. Does SAS use our initial values in some manner? How do these values interact with the multistart option?

3. What are some things we can do to ensure a global minimum? Because we get different estimates in each run, it feels like we are hitting a local minimum.

4. Is there a way in SAS to generate values of the objective function for a combination of parameters, so that I can graph a 3D plot and thus provide a better set of starting values?

5. In some cases, one of our parameters, rho, has an estimate of 1, which is a corner solution. How can we calculate standard errors in this case.

Thanks so much for your help! I had posted this yesterday and it got deleted somehow. Hence I am posting it again.

Site and O/S informations are as follows:

Site: 70055201 Release: 9.04.01M4P110916 System: LIN X64 Linux

best,

Srini

options ls = 132 ps = 2000 nodate nocenter;

data fiireg;
infile 'sample.prn';
input ccode $ & 1-10 w_uepp w_tq w_ret w_bm w_mret w_op_prof w_agro w_mcap;

proc sort;
by ccode;

data fiireg;
set fiireg;
by ccode;

proc means;

proc optmodel;

set alldata;

num w_ret{alldata};
num w_uepp{alldata};
num w_tq{alldata};
num w_mret{alldata};
num w_mcap{alldata};
num w_bm{alldata};
num w_agro{alldata};
num w_op_prof{alldata}; 

num mycov{i in 1.._nvar_,  j in 1..i};

number sigma_F = 1;

read data fiireg into alldata = [_n_] w_ret w_uepp w_tq w_mret w_mcap 
w_bm w_agro w_op_prof;

var sigma_T init 0.92, sigma_Z init 0.001, rho init 0.1, c1 init 0.8, c2 init 0.3, c3 init 0.05, c4 init 0.05, c5 init 0.05;

impvar Err{i in alldata} = w_ret[i] - (1 - rho*(sigma_T/sigma_F))*w_uepp[i] - ((sigma_T*sqrt(1 - (rho**2)))/sigma_Z)*w_tq[i] - c1*w_mret[i] - c2*w_mcap[i] - c3*w_bm[i] - c4* w_agro[i] - c5*w_op_prof[i];

min f = sum{i in alldata} Err[i]^2;

con a: rho >= -1;
con b: sigma_T >= 0;
con c: sigma_Z >= 0;
con d: rho <= 1;

solve with nlp / tech = ip multistart = (bndrange = 50) msnumstarts = 100 
covest = (cov = 5 covout = mycov) seed = 9730696;

print mycov;

print sigma_T sigma_Z rho c1 c2 c3 c4 c5;

Murgie · Posted 12-07-2017 05:50 PM

I want to add to my coauthor Srini's question above. From https://support.sas.com/resources/papers/proceedings13/158-2013.pdf we find

"RECURSION The uniform sampling, starting-point selection, local optimization, and sample-point update steps are repeated in sequence until either no sampled points are selected or the maximum number of local optimizations (specified via the MSMAXSTARTS= parameter) has been completed. At termination the algorithm reports the best local minimum that it has found."

1. Does this mean that any set of initial values given is over-ridden by SAS when we invoke the switch "ms."

2. This note above refers to "msMAXstarts" while Srini has used "msNUMstarts." Does that make any difference, or explain anything?

3. If starting values are generated by SAS on its own, with the uniform random number generator, and if subsequent starting points depend on initial (randomly selected) starting points, it leads to the scary instability of the same data, code, and machine, leading to different estimates on subsequent runs. Is there some way of overcoming this?

4. Must we just keep increasing the value of the parameter in msnumstarts or msmaxstarts till successive runs yield the same answer? This is not a previously solved problem, so there is no easy way to tell a priori if msnumstarts or msmaxstarts = 100 or 1000 is large enough.

5. Is there some way to make SAS take a list of starting points we give it, and use those in PROC OPTMODEL? Our reading is that whatever init values are specified are simply ignored. If they are used, and they don't change from one run to the next, getting different estimates each time is a huge drawback. Please help. Thank you.

TaoHuang · Posted 12-15-2017 07:28 PM

Hi Murgie,

Please find my answers to your questions in the following:

1A. The initial values supplied by the user is not overridden. See 2A of my reply to srinirangan123 below for more details.

2A. MSMAXSTARTS and MSNUMSTARTS are the same. MSMAXSTARTS is more descriptive as to what the option is about. MSNUMSTARTS was initially used when the Multistart option was in experiment. For backward compatibility MSNUMSTARTS is kept but undocumented.

3A. I'm not sure if I understood your question correctly. Was your question about the different estimates on subsequent runs, given the same SEED value? If so, this is an issue of a bug in the code, which will be fixed. See 1A of my reply to srinirangan123 below for more details.

4A. Once the bug is fixed, you should get the same solution on successive runs with the same machine setting. The deterministic behavior or repeatability of solutions is independent of the MSMAXSTARTS option.

5A. You can easily accomplish this via the OPTMODEL FOR or COFOR statement with which you can start the local optimization (i.e., with the MULTISTART option turned off) from the starting points you give in a loop.

TaoHuang · Posted 12-15-2017 04:17 PM

Hi Srini,

Thanks for using the Multistart option of the SAS/NLP solver in PROC OPTMODEL. The Multistart option of the SAS/NLP solver implements a heuristic that seeks to find an improved solution in a global optimization setting. The algorithm is not an exhaustive search approach, so there is no guarantee of finding the global solution, especially if local solutions abound.

I'm answering the five questions you posted as follows:

1A. The Multistart option in a non distributed computing environment is deterministic by design; hence you should have gotten the same solutions with different runs on the exactly same machine setting. Using your example I have found different solutions in different runs; hence it is a bug in the code. We are planning to fix it in a coming release.

2A. The Multistart algorithm uses the user supplied initial values/point in addition to other starting points generated by the algorithm to start the local optimizations. The initial values affect the decision variable bounds that are used in sampling for starting points, only if some of the decision variables are unbounded (one side or both sides). Note the sampling bounds are different from the decision variable bounds that defines the optimization problem.

3A. As I mentioned above, the algorithm is not an exhaustive search, so there is no guarantee of a global minimum. Getting different estimates in each run is a bug, which we will fix.

4A. You should be able to do this by evaluating the objective function with different points in an OPTMODEL loop statement. Then output the objective function values to a dataset, which you can use to plot a 3D graph.

5A. My background/specialty is not in statistics. With that said, I know one way to calculate the standard error in the estimate of rho is to take the square root of the corresponding diagonal element in the estimated covariance matrix. A corner solution of rho may or may not entail a different calculation. Please consult a statistics expert.

srinirangan123 · Posted 12-16-2017 12:50 AM

Dear TaoHuang,

Thanks so much for your detailed replies.

We appreciate it as it gives us more clarity.

As a follow-up, I have two questions.

a. Do you know roughly by when the bug will be fixed? This relates to the fact that each run gives us different estimates when we don't change any thing in the program, including keeping the seed the same.

b. One option we worked with is the msbndrange option. What we find is that when msbndrange is set to a relatively small value of say 50 or 100, we get several local optima (more than 100). When I increase msbndrange to a much larger number, say 5000, the number of local optima come down, in some cases to just one. Further larger values of msbndrange yield more stable estimates.

We felt that widening the msbndrange option should lead a wider set of starting value and hence more local optima. But the results work the other way.

Could you clarify how we can use msbndrange better so that we get more reliable estimates?

Thanks so much!

Srinivasan Rangan

TaoHuang · Posted 12-16-2017 03:11 PM

Hi Srinivasan,

Q1: Do you know roughly by when the bug will be fixed? ...

A1: The nondeterminism is due to multi-threading of the algorithm. I realized your problem is pretty small--with only 8 decision variables. I replaced the statement in your original SAS program:

proc optmodel;

with

proc optmodel nthreads=1;

to run the program in a single threaded mode. And I ran it half a dozen times, all getting the exactly same solutions. The solution time on each run is small, < 2.5 seconds on my PC. If this solution time is acceptable, I'd suggest to run your program in a single threaded mode.

Q2: One option we worked with is the msbndrange option. ...

A2: MSBNDRANGE affects the decision variable bound range that is used in sampling for starting points. The wider it is, the more scattered the sampling points will be across the domain space where the sampling is conducted. Another factor to consider is that the total sample points is related to MSAMXSTARTS.

Given a value of MSMAXSTARTS, the total number of sample points are relatively fixed. When MSBNDRANGE is small, e.g., 50 or 100, the Multistart algorithm can "zoom in" and "see" more local mins. When MSBNDRANGE is large, there is "zoom out" effect--the algorithm does not see finer details of local mins. So if you increase MSBNDRANGE, you should also increase MSMAXSTARTS to have adequate sampling coverage. But note that the number of potential local optima can explode with the increase of MSBNDRANGE. 5000 is a quite big value for MSBNDRANGE in most cases.

Thanks.

Tao Huang

srinirangan123 · Posted 12-17-2017 03:30 AM

Dear Tao,

Thanks so much for your solution. This gives us hope! Your suggestion on keeping maxnumstarts and msbndrange proportional is alos helpful.

I tried the nthreads =1 option to the proc optmodel statement as well as the nothreads option based on

http://support.sas.com/documentation/cdl/en/ormpug/63352/HTML/default/viewer.htm#ormpug_optmodel_sec...

Unfortunately, I get the following error message

Error 688-782: The option NOTHREADS is unrecognized

Expecting one of the following: CDIGITS, ERRORLIMIT, FD, FDIGITS, FORCEFD, FORCEPRESOLVE, INITVAR, INTFUZZ, MAXLABLEN, MISSCHECK, MSGLIMIT, NOINITVAR, NOMISSCHECK, PDIGITS, PMATRIX, PRESOLVER, PRESTOL, PRINTLEVEL, PWIDTH, VARFUZZ.

NOTE: The OPTMODEL presolver is disabled when the COVEST= option is enabled.

NOTE: Problem generation will use 4 threads.

NOTE: Previous errors might cause the problem to be resolved incorrectly.

NOTE: The problem has 8 variables (8 free, 0 fixed).

NOTE: The problem uses 14076 implicit variables.

NOTE: The problem has 6 linear constraints (3 LE, 0 EQ, 3 GE, 0 range).

NOTE: The problem has 6 linear constraint coefficients.

NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range).

NOTE: The SOLTYPE= option is set to 0 (Optimal) because the COVEST= option is specified.

NOTE: The objective is in least squares form.

NOTE: The objective Jacobian has 14076 rows.

NOTE: Using analytic derivatives for objective.

NOTE: The NLP solver is called.

NOTE: The Active Set algorithm is used.

NOTE: The MULTISTART option is enabled.

NOTE: The deterministic parallel mode is enabled.

NOTE: The Multistart algorithm is executing in single-machine mode.

NOTE: The Multistart algorithm is using up to 4 threads.

NOTE: Random number seed 633333 is used.

Can you kindly let us know, how to correctly specify the numthreads option?

I am using SAS on LINUX and running on a cloud server.

Thanks a lot!

TaoHuang · Posted 12-22-2017 11:47 AM

Which version/release of SAS were you running? For an earlier release, use the PERFORMANCE statement below to run your program in single threaded mode:

proc optmodel;

. . .

performance nthreads=1;
solve with nlp / tech = ip multistart = (bndrange = 50) msnumstarts = 100 covest = (cov = 5 covout = mycov) seed = 9730696;

Thanks.

Tao

srinirangan123 · Posted 12-24-2017 11:14 PM

Dear Tao,

Thanks so much!

The performance option worked well.

Now I get stable results for every run 🙂

One hopefully, last question: When I change the seed, I obtain varying results with each run.

This suggests that my estimates are not even local minima.

If you have any advice on how I can make my results robust to changes in seed value, that would be greatly appreciated.

Thanks again for patiently responding to my queries.

best wishes,

Srini

Murgie · Posted 12-16-2017 05:10 AM

Thank you for your careful response.

1. The bug leaves us in a hard place with a time constraint. Because most other parts of the data work in this project are in SAS, a SAS solution would be best for us. We tried PROC NLP but for large sample sizes it doesn't seem to work. Is there another candidate solution within SAS? Or is there something outside of SAS that you would recommend?

2. For this objective function we have the exact 8x1 gradient vector and the exact 8x8 Hessian (computed using a symbolic algebra package called MAXIMA). But there doesn't seem to be any way to use that. We had assumed that in hill-climbing type grid search, having those exact values would help improve search efficiency. But we can't figure out any way to teach PROC OPTMODEL what we have, let alone make it use that information. Is there some procedure in SAS for this type of problem, that will that exact gradient and Hessian information?

3. Srini has noted the strange behavior of the BNDRANGE option. Increasing the bound reduces the number of local optima identified. If that is anomalous, would you consider that a bug as well?

Please help. Thank you. Best,

Murgie

TaoHuang · Posted 12-16-2017 03:19 PM

Hi Murgie,

Q1: The bug leaves us in a hard place with a time constraint. ...

A1: Please see my follow-up reply to Srinivasan, A1.

Q2: For this objective function we have the exact 8x1 gradient vector and the exact 8x8 Hessian ...

A2: PROC OPTMODEL uses automatic differentiation to compute EXACT gradient and Hessian for the users.

Q3: 3. Srini has noted the strange behavior of the BNDRANGE option. ...

A3: Please see my follow-up reply to Srinivasan, A2.

Thanks.

Tao Huang

Murgie · Posted 12-16-2017 03:26 PM

Thanks a lot. This helps very much.

Registration is open