Thank you for your reply! Indeed I am trying to analyze and document the methodology behind an existing code. So it is not my choice to use PROC NLP. I need to understand how it was used in the past to explain its theoretical methodology well. Initial Hessian As you mentioned, this statement is valid for my code "the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI , where the scalar r is computed from the magnitude of the initial gradient." However, it does not mention how it calculates the magnitude of the initial gradient. I made several trials to calculate r to find the initial hessian. Then I used this initial hessian to find search direction and used it with the step length noted in the SAS results for the corresponding iteration. I thought i could obtain the same new parameter set as SAS for the next iteration but i could not, even though I used the same gradient vector and alpha value(step length) as SAS for that iteration. For this reason i am not sure if there are some ambiguities about this initial hessian approximation or not. Hessian Update Formula I think the BFGS formula with Choleski factor is being used here as explained in Fletcher(1987). Could you please confirm that? Again, I could not see any explicit usage details for linearly constrained nonlinear objective function optimization. Line search methodology I think this is the most tricky and unclear part and needs more explanation. There are statements regarding line search, but they do not explain how it uses interpolation & extrapolation, how it initializes default step length (there are some statements in "Restricting the Step Length" section, but it does not clearly define the variables used) "LIS=2 specifies a line-search method that needs more function than gradient calls for quadratic and cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option." "In each iteration, a line search is done along the search direction to find an approximate optimum. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a step length alpha satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region defines an upper limit of the step length. Violating the left-side Goldstein condition can affect the positive definiteness of the quasi-Newton update. In those cases, either the update is skipped or the iterations are restarted with an identity matrix resulting in the steepest descent or ascent search direction. Line-search algorithms other than the default one can be specified with the LINESEARCH= option."
... View more