# Re: [Rd] scale(x, center=FALSE) (PR#14219)

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Fri, 12 Mar 2010 17:44:33 -0500

> I'm resending this after a week ... I really don't want to nag, but
> I also would not like to see this sink below the waves.
> [re: behavior of scale() when center=FALSE and scale=TRUE]
>>  Again, I agree with you that the behavior is not optimal, but it is
>> very hard to make changes in R when the behavior is sub-optimal rather
>> than actually wrong (by some definition).  R-core is very conservative
>> about changes that break backward compatibility; I would like it if they
>> chose to change the function to use standard deviation rather than
>> root-mean-square, but I doubt it will happen (and it would break things
>> for any users who are relying on the current definition).


> [snip]
>> I have attached a patch
>> file (and append the information below as well) that changes "standard
>> deviation" back to "root mean square" and is much more explicit about
>> this issue ... I hope R-core will jump in, critique it, and possibly use
>> it in some form to improve (?) the documentation ...
>>  [PS: I have written that the scaling is equivalent to sd() "if and
>> only if" centering was done.  Technically it would also be equivalent if
> ===================================================================
> --- scale.Rd	(revision 51180)
> +++ scale.Rd	(working copy)


> @@ -41,13 +41,18 @@
> equal to the number of columns of \code{x}, then each column of
> \code{x} is divided by the corresponding value from \code{scale}. If
> \code{scale} is \code{TRUE} then scaling is done by dividing the
> - (centered) columns of \code{x} by their standard deviations, and if
> + (centered) columns of \code{x} by their root-mean-squares, and if
> \code{scale} is \code{FALSE}, no scaling is done.
> -
> - The standard deviation for a column is obtained by computing the
> - square-root of the sum-of-squares of the non-missing values in the
> - column divided by the number of non-missing values minus one (whether
> - or not centering was done).
> +
> + The root-mean-square for a (possibly centered)
> + column is defined as
> + \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))},
> + where \eqn{x} is a vector of the non-missing values
> + and \eqn{n} is the number of non-missing values.
> + If (and only if) centering was done,
> + this is equivalent to \code{sd(x,na.rm=TRUE)}.
> + (To scale by the standard deviations without centering,
> + use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
> }
> \references{
> Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>
> (Bump re: suggested update to scale.Rd . Is this under
> consideration? I'll stop pestering if it's considered
> unacceptable, just don't want it to vanish without a trace ...)
