From: Spencer Graves <spencer.graves_at_pdf.com>

Date: Mon 05 Dec 2005 - 17:11:52 EST

**>>GENERAL REFERENCE ON NONLINEAR OPTIMIZATION?
**

*>>
*

*>> What are your favorite references on nonlinear optimization? I like
*

*>>Bates and Watts (1988) Nonlinear Regression Analysis and Its
*

*>>Applications (Wiley), especially for its key insights regarding
*

*>>parameter effects vs. intrinsic curvature. Before I spent time and
*

*>>money on several of the refences cited on the help pages for "optim",
*

*>>"nlm", etc., I thought I'd ask you all for your thoughts.
*

*>>
*

*>>ROSENBROCK'S BANANA VALLEY FUNCTION?
*

*>>
*

*>> Beyond this, I wonder if someone help me understand the lessons one
*

*>>should take from Rosenbrock's banana valley function:
*

*>>
*

*>>banana <- function(x){
*

*>> 100*(x[2]-x[1]^2)^2+(1-x[1])^2
*

*>>}
*

*>>
*

*>> This a quartic x[1] and a parabola in x[2] with a unique minimum at
*

*>>x[2]=x[1]=1. Over the range (-1, 2)x(-1,1), it looks like a long,
*

*>>curved, deep, narrow banana-shaped valley. It is a known hard problem
*

*>>in nonlinear regression, but these difficulties don't affect "nlm" or
*

*>>"nlminb" until the hessian is provided analytically (with R 2.2.0 under
*

*>>Windows XP):
*

*>>
*

*>>nlm(banana, c(-1.2, 1)) # found the minimum in 23 iterations
*

*>>nlminb(c(-1.2, 1), banana)# found the min in 35 iterations
*

*>>
*

*>>Dbanana <- function(x){
*

*>> c(-400*x[1]*(x[2] - x[1]^2) - 2*(1-x[1]),
*

*>> 200*(x[2] - x[1]^2))
*

*>>}
*

*>>banana1 <- function(x){
*

*>> b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2
*

*>> attr(b, "gradient") <- Dbanana(x)
*

*>> b
*

*>>}
*

*>>
*

*>>nlm(banana1, c(-1.2, 1)) # solved the problem in 24 iterations
*

*>>nlminb(c(-1.2, 1), banana, Dbanana)# solution in 35 iterations
*

*>>
*

*>>D2banana <- function(x){
*

*>> a11 <- (2 - 400*(x[2] - x[1]^2) + 800*x[2]*x[1]^2)
*

*>> a21 <- (-400*x[1])
*

*>> matrix(c(a11,a21,a21,200),2,2)
*

*>>}
*

*>>banana2 <- function(x){
*

*>> b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2
*

*>> attr(b, "gradient") <- Dbanana(x)
*

*>> attr(b, "hessian") <- D2banana(x)
*

*>> b
*

*>>}
*

*>>
*

*>>nlm(banana2, c(-1.2, 1))
*

*>># Found the valley but not the minimum
*

*>># in the default 100 iterations.
*

*>>nlm(banana2, c(-1.2, 1), iterlim=10000)
*

*>># found the minimum to 3 significant digits in 5017 iterations.
*

*>>
*

*>>nlminb(c(-1.2, 1), banana, Dbanana, D2banana)
*

*>># took 95 iterations to find the answer to double precision.
*

*>>
*

*>> To understand this better, I wrote my own version of "nlm" (see
*

*>>below), and learned that the hessian is often indefinite, with one
*

*>>eigenvalue positive and the other negative. If I understand correctly,
*

*>>a negative eigenvalue of the hessian tends to push the next step towards
*

*>>increasing rather than decreasing the function. I tried a few things
*

*>>that accelerated the convergence slightly, but but my "nlm." still had
*

*>>not converged after 100 iterations.
*

*>>
*

*>> What might be done to improve the performance of something like "nlm"
*

*>>without substantially increasing the overhead for other problems?
*

*>>
*

*>> Thanks.
*

*>> spencer graves
*

*>>#############################
*

*>>nlm. <- function(f=fgh, p=c(-1.2, 1),
*

*>> gradtol=1e-6, steptol=1e-6, iterlim=100){
*

*>># R code version of "nlm"
*

*>># requiring analytic gradient and hessian
*

*>>#
*

*>># Initial evaluation
*

*>> f.i <- f(p)
*

*>> f0 <- f.i+1
*

*>># Iterate
*

*>> for(i in 1:iterlim){
*

*>> df <- attr(f.i, "gradient")
*

*>># Gradient sufficiently small?
*

*>> if(sum(df^2)<(gradtol^2)){
*

*>> return(list(minimum=f.i, estimate=p+dp,
*

*>> gradient=df, hessian=d2f, code=1,
*

*>> iterations=i))
*

*>> }
*

*>>#
*

*>> d2f <- attr(f.i, "hessian")
*

*>> dp <- (-solve(d2f, df))
*

*>># Step sufficiently small?
*

*>> if(sum(dp^2)<(steptol^2)){
*

*>> return(list(minimum=f.i, estimate=p+dp,
*

*>> gradient=df, hessian=d2f, code=2,
*

*>> iterations=i))
*

*>> }
*

*>># Next iter
*

*>> f0 <- f.i
*

*>> f.i <- f(p+dp)
*

*>># Step size control
*

*>> if(f.i>=f0){
*

*>> for(j in 1:iterlim){
*

*>> {
*

*>> if(j==1){
*

*>> d2f.eig <- eigen(d2f, symmetric=T)
*

*>> cat("\nstep size control; i=", i,
*

*>> "; p=", round(p, 3), "; dp=", signif(dp, 2),
*

*>> "; eig(hessian)=",signif(d2f.eig$values, 4))
*

*>> v.max <- (1+max(abs(d2f.eig$values)))
*

*>> v.adj <- pmax(.001*v.max, abs(d2f.eig$values))
*

*>> evec.df <- (t(d2f.eig$vectors) %*% df)
*

*>> dp <- (-(d2f.eig$vectors %*%
*

*>> (evec.df/(1+v.adj))))
*

*>> }
*

*>> else{
*

*>> cat(".")
*

*>> dp <- dp/2
*

*>> }
*

*>> }
*

*>> f.i <- f(p+dp)
*

*>> f2 <- f(p+dp/2)
*

*>> if(f2<f.i){
*

*>> dp <- dp/2
*

*>> f.i <- f2
*

*>> }
*

*>> if(f.i<f0)break # j
*

*>> }
*

*>> if(f.i>=f0){
*

*>> cat("\n")
*

*>> return(list(minimum=f0, estimate=p,
*

*>> gradient=attr(f0, "gradient"),
*

*>> hessian=attr(f0, "hessian"), code=3,
*

*>> iterations=i))
*

*>> }
*

*>> }
*

*>> p <- p+dp
*

*>> cat(i, p, f.i, "\n")
*

*>> }
*

*>> return(list(minimum=f.i, estimate=p,
*

*>> gradient=df, hessian=d2f, code=4,
*

*>> iterations=i))
*

*>>}
*

*>>
*

*>>--
*

*>>Spencer Graves, PhD
*

*>>Senior Development Engineer
*

*>>PDF Solutions, Inc.
*

*>>333 West San Carlos Street Suite 700
*

*>>San Jose, CA 95110, USA
*

*>>
*

*>>spencer.graves@pdf.com
*

*>>www.pdf.com <http://www.pdf.com>
*

*>>Tel: 408-938-4420
*

*>>Fax: 408-280-7915
*

*>>
*

*>>______________________________________________
*

*>>R-help@stat.math.ethz.ch mailing list
*

*>>https://stat.ethz.ch/mailman/listinfo/r-help
*

*>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

*>>
*

>

>

Date: Mon 05 Dec 2005 - 17:11:52 EST

Hi, Gabor:

Thanks for the reply and the reference to Wolkowicz.

Your comments seem to be contradicted by 'demo(nlm)', which shows 'nlm' getting lost when the hessian is provided; without the hessian, nlm did fine. As noted below, "nlminb" exhibits essentially the same pathology.

One aspect of the problem in this case seems to be that the hessian is often indefinite, with one eigenvalue positive and the other negative. The negative eigenvalue pushes the increment towards increasing rather than decreasing the function. When I used the absolute values of the eigenvalues, it helped but not enough. I did limited tests of damping, but perhaps not enough. I wonder if nlm and optim might benefit from some modification of their damping algorithms when the hessian is provided -- and if yes, what that modification should be?

Best Wishes, spencer graves

Gabor Grothendieck wrote:

> Henry Wolkowicz (google his page for lots of optimization references) > mentioned to me that that function is a standard example to show > that first order methods (e.g. steepest descent) can fail by repeatedly > crossing back and forth over the valley whereas second order methods > (damped Newton, damped quasi-Newton, trust region) use curvature > info to get convergence in a few iterations. > > On 12/4/05, Spencer Graves <spencer.graves@pdf.com> wrote: >

>

>

-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves@pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Mon Dec 05 17:19:22 2005

*
This archive was generated by hypermail 2.1.8
: Mon 05 Dec 2005 - 20:24:38 EST
*