[Rd] tapply with weighted.mean

From: Martyn Plummer <plummer_at_iarc.fr>
Date: Thu 27 Jan 2005 - 02:40:43 EST

We were caught out recently attempting to use tapply to get a table of weighted means. This gives the wrong answer (or, more correctly, not the answer we were expecting), as the following example shows:

R> x <- 1:10 #some data
R> w <- c(1:5,5:1) #weights
R> id <- rep(1:2,rep(5,2)) #id values
R> weighted.mean(x[id==1],w[id==1]) #Weighted mean of x in group 1

[1] 3.666667

R> weighted.mean(x[id==2],w[id==2]) #Weighted mean of x in group 2
[1] 7.333333

R> tapply(x,INDEX=id,FUN=weighted.mean,w=w) #Wrong! 1 2
3 8

The reason for this is that tapply splits it's first argument by the INDEX variable, but does not split any of the arguments supplied via ... So the result is

c(weighted.mean(x[id==1],w), weighted.mean(x[id==2],w))

R silently replicates the shorter variable to match the length of the longer one.

I draw two conclusions from this:

  1. weighted.mean(x,w) should include a length check for w. The documentation says it should be the same length as x, so this should be enforced.
  2. More importantly, the help page for tapply should explicitly warn the user that optional arguments supplied to 'FUN' are not split by 'INDEX'. I really only understood the behaviour of tapply after inspecting the code. Then it became obvious why this could never work.

I hope I am not being too obtuse. Any objections before I make these changes?


R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Jan 27 01:50:35 2005

This archive was generated by hypermail 2.1.8 : Thu 27 Jan 2005 - 02:27:24 EST