[R] scaling and optim

From: Ross Boylan <ross_at_biostat.ucsf.edu>
Date: Thu, 07 Feb 2008 20:49:18 -0500


?optim says, in describing the control parameter,

'fnscale' An overall scaling to be applied to the value of 'fn'

          and 'gr' during optimization. If negative, turns the problem
          into a maximization problem. Optimization is performed on
          'fn(par)/fnscale'.


'parscale' A vector of scaling values for the parameters.
Optimization is performed on 'par/parscale' and these should be comparable in the sense that a unit change in any element produces about a unit change in the scaled value.
  1. Does the final phrase 'produces about a unit change in the scaled value' refer to the value of the objective function? Substantively I think it must, though grammatically it's less clear.
  2. "Optimization is performed on 'par/parscale'" means a) if par is 3 and parscale is 10 then the objective function will be evaluated at .3. This strikes me as the literal reading of what the clause means; it also strikes me as extremely unlikely this is what really happens. or b) if par is 3 and parscale is 10 then the objective function is evaluated at 3. The optimizer records this as if par were 30, and subsequently, e.g. when computing deltas or making steps, does so in this space. So a step of d becomes a step of d/parscale for the real objective function. c) About the same as b, only steps of d become d*parscale.
  3. Does scaling affect any of the final results (including log-likelihood, std errors, ...), assuming the scaled and unscaled methods find the same untransformed point?

I assume that scaling is transparent in the sense of 3, i.e. does not affect any of the reported results (unless it changes how well the optimizer works or fnscale converts minimizing to maximizing). Even given that, suppose I think that
f(x)-f(x1) approx equals f(x)-f(x2) where x1[1] = x[1] + 10 and
x2[2] = x[2] + 1, and x, x1, and x2 are otherwise equal. Does this mean I should have parscale = c(10, 1) or parscale= (1/10, 1)?

Since I'm not sure about parscale, I'm really not sure about

'ndeps' A vector of step sizes for the finite-difference

          approximation to the gradient, on 'par/parscale' scale.
          Defaults to '1e-3'.

So, if I don't do any other rescaling, I might say ndeps=c(1e-2, 1e3)
in the previous example (response to x[1] is 10 times flatter than to x[2]).

I guess that if I do have parscale set, I leave the default ndeps (1e-3 for both) and get the same effect. Right?



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 08 Feb 2008 - 01:54:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Feb 2008 - 06:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive