[R] To many NA's from mean(..., na.rm=T) when a column is all NA's

From: Jim Robison-Cox <jimrc_at_math.montana.edu>
Date: Tue 14 Jun 2005 - 03:05:46 EST

Dear R-help folks,

I am seeing unexpected behaviour from the function mean with option na.rm =TRUE (which is removing a whole column of a data frame or matrix.


testcase <- data.frame( x = 1:3, y = rep(NA,3))

mean(testcase[,1], na.rm=TRUE)
[1] 2

mean(testcase[,2], na.rm = TRUE)
[1] NaN

  OK, so far that seems sensible. Now I'd like to compute both means at once:

  lapply(testcase, mean, na.rm=T) ## this works $x
[1] 2

[1] NaN

  But I thought that this would also work:

apply(testcase, 2, mean, na.rm=T)
 x y
Warning messages:
1: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...)
2: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...)

  If I have a data frame or a matrix where one entire column is NA's, mean(x, na.rm=T) works on that column, returning NaN, but fails using apply, in that apply returns NA for ALL columns.   lapply works fine on the data frame.

  If you wonder why I'm building data frames with columns that could be all missing -- they arise as output of a simulation. The fact that the entire column is missing is informative in itself.

  I do wonder if this is a bug.


Jim Robison-Cox               ____________
Department of Math Sciences  |            |       phone: (406)994-5340
2-214 Wilson Hall             \   BZN, MT |       FAX:   (406)994-1789
Montana State University       |  *_______|
Bozeman, MT 59717-2400          \_|      e-mail: jimrc@math.montana.edu

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 14 03:09:40 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:33 EST