From: Ben Bolker <bolker_at_ufl.edu>

Date: Thu, 04 Mar 2010 23:06:16 +0000 (UTC)

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 05 Mar 2010 - 01:57:52 GMT

Date: Thu, 04 Mar 2010 23:06:16 +0000 (UTC)

Ben Bolker <bolker <at> ufl.edu> writes:

[re: behavior of scale() when center=FALSE and scale=TRUE]

> Again, I agree with you that the behavior is not optimal, but it is

*> very hard to make changes in R when the behavior is sub-optimal rather
**> than actually wrong (by some definition). R-core is very conservative
**> about changes that break backward compatibility; I would like it if they
**> chose to change the function to use standard deviation rather than
**> root-mean-square, but I doubt it will happen (and it would break things
**> for any users who are relying on the current definition).
*

[snip]

> I have attached a patch

*> file (and append the information below as well) that changes "standard
**> deviation" back to "root mean square" and is much more explicit about
**> this issue ... I hope R-core will jump in, critique it, and possibly use
**> it in some form to improve (?) the documentation ...
**>
**> [PS: I have written that the scaling is equivalent to sd() "if and
**> only if" centering was done. Technically it would also be equivalent if
**> the column already had zero mean ...]
**>
*

- scale.Rd (revision 51180) +++ scale.Rd (working copy) @@ -41,13 +41,18 @@ equal to the number of columns of \code{x}, then each column of \code{x} is divided by the corresponding value from \code{scale}. If \code{scale} is \code{TRUE} then scaling is done by dividing the - (centered) columns of \code{x} by their standard deviations, and if + (centered) columns of \code{x} by their root-mean-squares, and if \code{scale} is \code{FALSE}, no scaling is done. - - The standard deviation for a column is obtained by computing the - square-root of the sum-of-squares of the non-missing values in the - column divided by the number of non-missing values minus one (whether - or not centering was done). + + The root-mean-square for a (possibly centered) + column is defined as + \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))}, + where \eqn{x} is a vector of the non-missing values + and \eqn{n} is the number of non-missing values. + If (and only if) centering was done, + this is equivalent to \code{sd(x,na.rm=TRUE)}. + (To scale by the standard deviations without centering, + use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.) } \references{ Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)

(Bump re: suggested update to scale.Rd . Is this under consideration? I'll stop pestering if it's considered unacceptable, just don't want it to vanish without a trace ...)

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 05 Mar 2010 - 01:57:52 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 05 Mar 2010 - 11:00:57 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*