Re: [Rd] scale(x, center=FALSE) (PR#14219)

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Fri, 12 Mar 2010 17:44:33 -0500

On Mar 12, 2010, at 1:29 PM, Ben Bolker wrote:

>
> I'm resending this after a week ... I really don't want to nag, but
> I also would not like to see this sink below the waves.
>

It has been closed as feature/FAQ with the note: "As documented on the help page!"

> Is there a preferred protocol for requesting comments without nagging too much? I would add a comment to 14219 (and was curious to see whether it was rejected) ... I went to bugzilla, and bug 14219 doesn't seem to exist any more -- either as open or as closed -- don't know if it got lost, or thrown away, when the bug system migrated?
>

Hmm.. there was apparently an error when importing the feature&FAQ box. Unfortunately Jitterbug left some duplicate bugs in different categories so the import was not as easy as it should be. I'll double check the IDs to see if any others are missing -- I ran import for 14219 manually now.

Thanks,
Simon

>
> [re: behavior of scale() when center=FALSE and scale=TRUE]
>

>>  Again, I agree with you that the behavior is not optimal, but it is
>> very hard to make changes in R when the behavior is sub-optimal rather
>> than actually wrong (by some definition).  R-core is very conservative
>> about changes that break backward compatibility; I would like it if they
>> chose to change the function to use standard deviation rather than
>> root-mean-square, but I doubt it will happen (and it would break things
>> for any users who are relying on the current definition).

>
> [snip]
>
>> I have attached a patch
>> file (and append the information below as well) that changes "standard
>> deviation" back to "root mean square" and is much more explicit about
>> this issue ... I hope R-core will jump in, critique it, and possibly use
>> it in some form to improve (?) the documentation ...
>> 
>>  [PS: I have written that the scaling is equivalent to sd() "if and
>> only if" centering was done.  Technically it would also be equivalent if
>> the column already had zero mean ...]
>> 

> ===================================================================
> --- scale.Rd (revision 51180)
> +++ scale.Rd (working copy)
> @@ -41,13 +41,18 @@
> equal to the number of columns of \code{x}, then each column of
> \code{x} is divided by the corresponding value from \code{scale}. If
> \code{scale} is \code{TRUE} then scaling is done by dividing the
> - (centered) columns of \code{x} by their standard deviations, and if
> + (centered) columns of \code{x} by their root-mean-squares, and if
> \code{scale} is \code{FALSE}, no scaling is done.
> -
> - The standard deviation for a column is obtained by computing the
> - square-root of the sum-of-squares of the non-missing values in the
> - column divided by the number of non-missing values minus one (whether
> - or not centering was done).
> +
> + The root-mean-square for a (possibly centered)
> + column is defined as
> + \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))},
> + where \eqn{x} is a vector of the non-missing values
> + and \eqn{n} is the number of non-missing values.
> + If (and only if) centering was done,
> + this is equivalent to \code{sd(x,na.rm=TRUE)}.
> + (To scale by the standard deviations without centering,
> + use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
> }
> \references{
> Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>
> (Bump re: suggested update to scale.Rd . Is this under
> consideration? I'll stop pestering if it's considered
> unacceptable, just don't want it to vanish without a trace ...)
>
>
> --
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker_at_ufl.edu / people.biology.ufl.edu/bolker
> GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 12 Mar 2010 - 22:46:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 13 Mar 2010 - 09:00:58 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive