Re: [Rd] A suggestion for an amendment to tapply

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 6 Nov 2007 07:23:56 +0000 (GMT)

On Tue, 6 Nov 2007, Bill.Venables_at_csiro.au wrote:

> Unfortunately I think it would break too much existing code. tapply()
> is an old function and many people have gotten used to the way it works
> now.

It is also not necessarily desirable: FUN(numeric(0)) might be an error. For example:

> Z <- data.frame(x=rnorm(10), f=rep(c("a", "b"), each=5))[1:5, ]
> tapply(Z$x, Z$f, sd)

but sd(numeric(0)) is an error. (Similar things involving var are 'in the wild' and so would be broken.)

> This is not to suggest there could not be another argument added at the
> end to indicate that you want the new behaviour, though. e.g.
>
> tapply <- function (X, INDEX, FUN=NULL, ..., simplify=TRUE,
> handle.empty.levels = FALSE)
>
> but this raises the question of what sort of time penalty the
> modification might entail. Probably not much for most situations, I
> suppose. (I know this argument name looks long, but you do need a
> fairly specific argument name, or it will start to impinge on the ...
> argument.)
>
> Just some thoughts.
>
> Bill Venables.
>
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary): +61 7 3826 7304
> Mobile: +61 4 8819 4402
> Home Phone: +61 7 3286 7700
> mailto:Bill.Venables_at_csiro.au
> http://www.cmis.csiro.au/bill.venables/
>
> -----Original Message-----
> From: r-devel-bounces_at_r-project.org
> [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Andrew Robinson
> Sent: Tuesday, 6 November 2007 3:10 PM
> To: R-Devel
> Subject: [Rd] A suggestion for an amendment to tapply
>
> Dear R-developers,
>
> when tapply() is invoked on factors that have empty levels, it returns
> NA. This behaviour is in accord with the tapply documentation, and is
> reasonable in many cases. However, when FUN is sum, it would also
> seem reasonable to return 0 instead of NA, because "the sum of an
> empty set is zero, by definition."
>
> I'd like to raise a discussion of the possibility of an amendment to
> tapply.
>
> The attached patch changes the function so that it checks if there are
> any empty levels, and if there are, replaces the corresponding NA
> values with the result of applying FUN to the empty set. Eg in the
> case of sum, it replaces the NA with 0, whereas with mean, it replaces
> the NA with NA, and issues a warning.
>
> This change has the following advantage: tapply and sum work better
> together. Arguably, tapply and any other function that has a non-NA
> response to the empty set will also work better together.
> Furthermore, tapply shows a warning if FUN would normally show a
> warning upon being evaluated on an empty set. That deviates from
> current behaviour, which might be bad, but also provides information
> that might be useful to the user, so that would be good.
>
> The attached script provides the new function in full, and
> demonstrates its application in some simple test cases.
>
> Best wishes,
>
> Andrew
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 06 Nov 2007 - 07:42:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 07 Nov 2007 - 06:30:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.