Re: [R] Showing NAs when using table()

From: Terry Therneau <therneau_at_mayo.edu>
Date: Thu, 24 May 2007 08:48:07 -0500 (CDT)


Rephrasing David Kane's example

> b <- c(1,1,1,1,1, NA, 2,2,2,2)
> d <- factor(c(rep(c("A","B","C"), 3), NA))
> table(b, d, exclude=NULL)

      d
b      A B C

  1 2 2 1
  2 1 1 1
  <NA> 0 0 1

Why are only 9 observations instead of 10 listed in the table?

 This is a long-standing bug in Splus and R. Peter Dalgaar suggests recoding the factor variable so that "NA" is a level, rather than a "missing". This works, but it does not address the bug: for most of my factor variables I want missing to be missing so that omission works as expected in modeling. The exclude argument in table() should do what it says it does, which is to list ALL data in the table when exclude=NULL.

  At Mayo, we have replaced the table command to work around this (in place for 5+ years now). It has two additions: a method for factors that correctly propogates the exclude argument, and a change to exclude=NULL as the default. Table() is used, 99% of the time, to look at data on screen, and the number of missing is often the first question I'm asking; so we found the default to be, shall we say, non-intuitive.

   We argued these points with Insightful many years ago and got nowhere, the replys being a mix of a) it's not really broken and b) if we change it it might break something. We had not carried the argument forward to the R community, and just fix it ourselves. The revised version just works better day to day.

   In R, the manual page has been revised to state that the exclude argument is something different for factors, so I expect to remain in the minority. (I can't think of a time I would ever have wanted the actions of the new version of exclude, which for factors is a means only to exclude more things, rather than the usual use of keeping more in the table).

        Terry Therneau



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 May 2007 - 14:06:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 May 2007 - 15:01:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.