From: Martyn Plummer <plummer_at_iarc.fr>

Date: Thu 27 Jan 2005 - 02:40:43 EST

[1] 3.666667

R> weighted.mean(x[id==2],w[id==2]) #Weighted mean of x in group 2

[1] 7.333333

R> tapply(x,INDEX=id,FUN=weighted.mean,w=w) #Wrong! 1 2

3 8

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Jan 27 01:50:35 2005

Date: Thu 27 Jan 2005 - 02:40:43 EST

We were caught out recently attempting to use tapply to get a table of
weighted means. This gives the wrong answer (or, more correctly, not
the answer we were expecting), as the following example shows:

R> x <- 1:10 #some data R> w <- c(1:5,5:1) #weights R> id <- rep(1:2,rep(5,2)) #id values R> weighted.mean(x[id==1],w[id==1]) #Weighted mean of x in group 1

[1] 3.666667

R> weighted.mean(x[id==2],w[id==2]) #Weighted mean of x in group 2

[1] 7.333333

R> tapply(x,INDEX=id,FUN=weighted.mean,w=w) #Wrong! 1 2

3 8

The reason for this is that tapply splits it's first argument by the INDEX variable, but does not split any of the arguments supplied via ... So the result is

c(weighted.mean(x[id==1],w), weighted.mean(x[id==2],w))

R silently replicates the shorter variable to match the length of the longer one.

I draw two conclusions from this:

- weighted.mean(x,w) should include a length check for w. The documentation says it should be the same length as x, so this should be enforced.
- More importantly, the help page for tapply should explicitly warn the user that optional arguments supplied to 'FUN' are not split by 'INDEX'. I really only understood the behaviour of tapply after inspecting the code. Then it became obvious why this could never work.

I hope I am not being too obtuse. Any objections before I make these changes?

Martyn

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Jan 27 01:50:35 2005

*
This archive was generated by hypermail 2.1.8
: Thu 27 Jan 2005 - 02:27:24 EST
*