[R] To many NA's from mean(..., na.rm=T) when a column is all NA's

Dear R-help folks,

I am seeing unexpected behaviour from the function mean with option na.rm =TRUE (which is removing a whole column of a data frame or matrix.


testcase <- data.frame( x = 1:3, y = rep(NA,3))

mean(testcase[,1], na.rm=TRUE)
[1] 2

mean(testcase[,2], na.rm = TRUE)
[1] NaN

  OK, so far that seems sensible. Now I'd like to compute both means at once:

  lapply(testcase, mean, na.rm=T) ## this works $x
[1] 2

[1] NaN

  But I thought that this would also work:

apply(testcase, 2, mean, na.rm=T)
 x y
Warning messages:
1: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...)
2: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...)

  If I have a data frame or a matrix where one entire column is NA's, mean(x, na.rm=T) works on that column, returning NaN, but fails using apply, in that apply returns NA for ALL columns.   lapply works fine on the data frame.

  If you wonder why I'm building data frames with columns that could be all missing -- they arise as output of a simulation. The fact that the entire column is missing is informative in itself.

  I do wonder if this is a bug.


