Optimization with DBFGS

kgi1_a · Posted 02-03-2025 05:34 AM

Hi everyone,

I'm trying to understand how SAS implements the DBFGS methodology. I've read the PROC NLP documentation, but I still have some questions. I'm using the QUANEW method with the update option DBFGS. The output indicates that "Gradient and Hessian are computed using finite difference approximations," but I'm unclear on how the initial Hessian assumption is made. Additionally, I have linear constraints and am unsure if Cholesky factorization is used for the approximation of Hessian.

One of my main concerns is that the line search methodology is not well defined in the documentation. Does anyone have any ideas on how it works on the source code?

Thank you!

ballardw · Posted 02-03-2025 07:57 AM

Not that I can help directly with this question but I would strongly suggest showing the code you are using. That way more knowledgeable folks won't have to ask "were you using option XXXX" and similar as well. Options chosen may affect the behavior or even availability of other options and getting them all out before discussion is a good idea.

kgi1_a · Posted 02-03-2025 08:34 AM

Thank you for your notice. I use proc nlp with tech = quanew. I also specify FD as finite difference method for the calculation of gradient vector. I try to minimize an objective function and that is all about my code. This structure uses Dual Broyden - Fletcher - Goldfarb - Shanno Update (DBFGS) as the update method (as the documentation says). The things i dont understand are the approximate hessian estimations (explanation in the document noted below) and calculations, and line search algorithm of this method (it is explained like below in the documentation but it is not explicit.). I need to obtain the same results by hand for a simple example. That's why i try to understand the exact logic behind.

Line Search:

"In each iteration, a line search is done along the search direction to find an approximate optimum. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a step length ˛ satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region defines an upper limit of the step length. Violating the left-side Goldstein condition can affect the positive definiteness of the quasi-Newton update. In those cases, either the update is skipped or the iterations are restarted with an identity matrix resulting in the steepest descent or ascent search direction. Line-search algorithms other than the default one can be specified with the LINESEARCH= option."

Hessian Approximation:

"INHESS[=r] . The = r specification is not used: the initial estimate of the approximate Hessian is set to the true Hessian or crossproduct Jacobian at x(0). By default, if INHESSIAN=r is not specified, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI , where the scalar r is computed from the magnitude of the initial gradient." -- I dont think this is the case for my example. Even if I did not specify INHESS, it does not calculate the initial Hessian as r.I.

TaoHuang · Posted 02-08-2025 07:58 PM

With the options tech=quanew and update=dbfgs, the algorithm does not use the true Hessian of the original problem. Instead it uses (dual) BFGS method to approximate/update the Hessian of the quadratic subproblem that is part of a "standard" technique for nonlinear optimization problems. Since the algorithm uses a line search based approach, a line search is performed to determine the step length that it takes.

Looks like you didn't specify linesearch= option. The default line search method (equivalent to linesearch=2) is used. This is documented in the linesearch= option section.

In your case where you didn't specify inhess or inhess=r, the default is used. i.e.,
"the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI , where the scalar r is computed from the magnitude of the initial gradient."

I wonder why you use the fd= option. Unless your objective function involves "non-standard" functiona whose analytic derivatives are not available, there is no point to use the option because finite-difference gradient is 1) an approximation and hence inaccurate, and 2) slower than the analytic gradient which is automatically constructed for you by PROC NLP.

PROC NLP has become a legacy PROC in the category of nonlinear optimization algorithms. It is superseded by PROC OPTMODEL whose nonlinear optimization algorithms (accessible via solve with nlp statement) reflect the state-of-the-art development in nonlinear optimization. I encourage you to check that out.

kgi1_a · Posted 02-10-2025 04:30 AM

Thank you for your reply! Indeed I am trying to analyze and document the methodology behind an existing code. So it is not my choice to use PROC NLP. I need to understand how it was used in the past to explain its theoretical methodology well.

Initial Hessian

As you mentioned, this statement is valid for my code "the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI , where the scalar r is computed from the magnitude of the initial gradient." However, it does not mention how it calculates the magnitude of the initial gradient. I made several trials to calculate r to find the initial hessian. Then I used this initial hessian to find search direction and used it with the step length noted in the SAS results for the corresponding iteration. I thought i could obtain the same new parameter set as SAS for the next iteration but i could not, even though I used the same gradient vector and alpha value(step length) as SAS for that iteration. For this reason i am not sure if there are some ambiguities about this initial hessian approximation or not.

Hessian Update Formula

I think the BFGS formula with Choleski factor is being used here as explained in Fletcher(1987). Could you please confirm that? Again, I could not see any explicit usage details for linearly constrained nonlinear objective function optimization.

Line search methodology

I think this is the most tricky and unclear part and needs more explanation. There are statements regarding line search, but they do not explain how it uses interpolation & extrapolation, how it initializes default step length (there are some statements in "Restricting the Step Length" section, but it does not clearly define the variables used)

"LIS=2 specifies a line-search method that needs more function than gradient calls for
quadratic and cubic interpolation and cubic extrapolation; this method is implemented
as shown in Fletcher (1987) and can be modified to an exact line search by
using the LSPRECISION= option."

"In each iteration, a line search is done along the search direction to find an approximate optimum. The default
line-search method uses quadratic interpolation and cubic extrapolation to obtain a step length alpha satisfying
the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region defines an
upper limit of the step length. Violating the left-side Goldstein condition can affect the positive definiteness
of the quasi-Newton update. In those cases, either the update is skipped or the iterations are restarted with an
identity matrix resulting in the steepest descent or ascent search direction. Line-search algorithms other than
the default one can be specified with the LINESEARCH= option."

Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Re: Optimization with DBFGS

Registration is open