[Rd] A suggestion for an amendment to tapply

From: Andrew Robinson <A.Robinson_at_ms.unimelb.edu.au>
Date: Tue, 6 Nov 2007 16:10:28 +1100


Dear R-developers,

when tapply() is invoked on factors that have empty levels, it returns NA. This behaviour is in accord with the tapply documentation, and is reasonable in many cases. However, when FUN is sum, it would also seem reasonable to return 0 instead of NA, because "the sum of an empty set is zero, by definition."

I'd like to raise a discussion of the possibility of an amendment to tapply.

The attached patch changes the function so that it checks if there are any empty levels, and if there are, replaces the corresponding NA values with the result of applying FUN to the empty set. Eg in the case of sum, it replaces the NA with 0, whereas with mean, it replaces the NA with NA, and issues a warning.

This change has the following advantage: tapply and sum work better together. Arguably, tapply and any other function that has a non-NA response to the empty set will also work better together. Furthermore, tapply shows a warning if FUN would normally show a warning upon being evaluated on an empty set. That deviates from current behaviour, which might be bad, but also provides information that might be useful to the user, so that would be good.

The attached script provides the new function in full, and demonstrates its application in some simple test cases.

Best wishes,

Andrew

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/ 

______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Received on Tue 06 Nov 2007 - 05:19:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 06 Nov 2007 - 08:30:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.