Re: [Rd] A suggestion for an amendment to tapply

From: Andrew Robinson <A.Robinson_at_ms.unimelb.edu.au>
Date: Wed, 7 Nov 2007 12:43:46 +1100

These are important concerns. It seems to me that adding an argument as suggested by Bill will allow the user to side-step the problem identified by Brian.

Bill, under what kinds of circumstances would you anticipate a significant time penalty? I would be happy to check those out with some simulations.

If the timing seems acceptable, I can write a patch for tapply.R and tapply.Rd if anyone in the core is willing to consider them. Please contact me on or off list if so.

Best wishes to all,

Andrew

On Tue, Nov 06, 2007 at 07:23:56AM +0000, Prof Brian Ripley wrote:
> On Tue, 6 Nov 2007, Bill.Venables@csiro.au wrote:
>
> >Unfortunately I think it would break too much existing code. tapply()
> >is an old function and many people have gotten used to the way it works
> >now.
>
> It is also not necessarily desirable: FUN(numeric(0)) might be an error.
> For example:
>
> >Z <- data.frame(x=rnorm(10), f=rep(c("a", "b"), each=5))[1:5, ]
> >tapply(Z$x, Z$f, sd)
>
> but sd(numeric(0)) is an error. (Similar things involving var are 'in the
> wild' and so would be broken.)
>
> >This is not to suggest there could not be another argument added at the
> >end to indicate that you want the new behaviour, though. e.g.
> >
> >tapply <- function (X, INDEX, FUN=NULL, ..., simplify=TRUE,
> >handle.empty.levels = FALSE)
> >
> >but this raises the question of what sort of time penalty the
> >modification might entail. Probably not much for most situations, I
> >suppose. (I know this argument name looks long, but you do need a
> >fairly specific argument name, or it will start to impinge on the ...
> >argument.)
> >
> >Just some thoughts.
> >
> >Bill Venables.
> >
> >Bill Venables
> >CSIRO Laboratories
> >PO Box 120, Cleveland, 4163
> >AUSTRALIA
> >Office Phone (email preferred): +61 7 3826 7251
> >Fax (if absolutely necessary): +61 7 3826 7304
> >Mobile: +61 4 8819 4402
> >Home Phone: +61 7 3286 7700
> >mailto:Bill.Venables_at_csiro.au
> >http://www.cmis.csiro.au/bill.venables/
> >
> >-----Original Message-----
> >From: r-devel-bounces_at_r-project.org
> >[mailto:r-devel-bounces_at_r-project.org] On Behalf Of Andrew Robinson
> >Sent: Tuesday, 6 November 2007 3:10 PM
> >To: R-Devel
> >Subject: [Rd] A suggestion for an amendment to tapply
> >
> >Dear R-developers,
> >
> >when tapply() is invoked on factors that have empty levels, it returns
> >NA. This behaviour is in accord with the tapply documentation, and is
> >reasonable in many cases. However, when FUN is sum, it would also
> >seem reasonable to return 0 instead of NA, because "the sum of an
> >empty set is zero, by definition."
> >
> >I'd like to raise a discussion of the possibility of an amendment to
> >tapply.
> >
> >The attached patch changes the function so that it checks if there are
> >any empty levels, and if there are, replaces the corresponding NA
> >values with the result of applying FUN to the empty set. Eg in the
> >case of sum, it replaces the NA with 0, whereas with mean, it replaces
> >the NA with NA, and issues a warning.
> >
> >This change has the following advantage: tapply and sum work better
> >together. Arguably, tapply and any other function that has a non-NA
> >response to the empty set will also work better together.
> >Furthermore, tapply shows a warning if FUN would normally show a
> >warning upon being evaluated on an empty set. That deviates from
> >current behaviour, which might be bad, but also provides information
> >that might be useful to the user, so that would be good.
> >
> >The attached script provides the new function in full, and
> >demonstrates its application in some simple test cases.
> >
> >Best wishes,
> >
> >Andrew
> >
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 07 Nov 2007 - 01:53:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 07 Nov 2007 - 10:30:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.