Re: [Rd] [R] computing the variance

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Mon 05 Dec 2005 - 19:37:50 GMT

On 12/5/2005 2:25 PM, (Ted Harding) wrote:
> On 05-Dec-05 Martin Maechler wrote:

>>     UweL> x <- c(1,2,3,4,5)
>>     UweL> n <- length(x)
>>     UweL> var(x)*(n-1)/n
>> 
>>     UweL> if you really want it.
>> 
>> It seems Insightful at some point in time have given in to
>> this user request, and S-plus nowadays has
>> an argument  "unbiased = TRUE"
>> where the user can choose {to shoot (him/her)self in the leg and}
>> require 'unbiased = FALSE'.
>> {and there's also 'SumSquraes = FALSE' which allows to not
>> require any division (by N or N-1)}
>> 
>> Since in some ``schools of statistics'' people are really still
>> taught to use a 1/N variance, we could envisage to provide such an
>> argument to var() {and cov()} as well.  Otherwise, people define
>> their own variance function such as  
>>       VAR <- function(x,....) .. N/(N-1)*var(x,...)
>> Should we?

>
> If people need to do this, such an option would be a convenience,
> but I don't see that it has much further merit than that.
>
> My view of how to calculate a "variance" is based, not directly
> on the the "unbiased" issue, but on the following.
>
> Suppose you define a RV X as a single value sampled from a finite
> population of values X1,...,XN.
>
> The variance of X is (or damn well should be) defined as
>
> Var(X) = E(X^2) - (E(X))^2
>
> and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).

I don't follow this. I agree with the first line (though I prefer to write it differently), but I don't see how it leads to the second. For example, consider a distribution which is equally likely to be +/- 1, and a sample from it consisting of a single 1 and a single -1. The first formula gives 1 (which is the variance), the second gives 2.

The second formula is unbiased because in a random sample I am just as likely to get a 0 from the second formula, but I'm curious about what you mean by "this comes to".

Duncan



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue Dec 06 06:42:31 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:34 GMT