Re: [Rd] Native implementation of rowMedians()

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Mon, 14 May 2007 14:31:14 +0200

>>>>> "BDR" == Prof Brian Ripley <ripley_at_stats.ox.ac.uk> >>>>> on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:

    BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
>> On 5/14/07, Prof Brian Ripley <ripley_at_stats.ox.ac.uk> wrote:

    >>> 
    >>> > Hi Henrik,
    >>> >>>>>> "HenrikB" == Henrik Bengtsson <hb_at_stat.berkeley.edu>
    >>> >>>>>>     on Sun, 13 May 2007 21:14:24 -0700 writes:
    >>> >
    >>> >    HenrikB> Hi,
    >>> >    HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for 
    >>> matrices that
    >>> >    HenrikB> handles missing values implemented in C.  It has been

    BDR> [...]

    >>> Also, the 'a version of rowMedians' made me wonder what other version
    >>> there was, and it seems there is one in Biobase which looks a more
    >>> natural home.

>>
>> The rowMedians() in Biobase utilizes rowQ() in ditto. I actually
>> started of by adding support for missing values to rowQ() resulting in
>> the method rowQuantiles(), for which there are also internal functions
>> for both integer and double matrices. rowQuantiles() is in R.native
>> too, but since it has much less CPU milage I wanted to wait with that.
>> The rowMedians() is developed from my rowQuantiles() optimized for
>> the 50% quantile.
>>
>> Why do you think it is more natural to host rowMedians() in Biobase
>> than in one of the core R packages? Biobase comes with a lot of
>> overhead for people not in the Bio-world.
    BDR> Because that is where there seems to be a need for it, and having multiple 
    BDR> functions of the same name in different packages is not ideal (and even 
    BDR> with namespaces can cause confusion).

That's correct, of course.
However, I still think that quantiles (and statistics derived from them) in general and medians in particular are under-used by many user groups. For some useRs, speed can be an important reason and for that I had made a big effort to provide runmed() in R, and I think it would be worthwhile to provide fast rowwise medians and quantiles, here as well.

Also, BTW, I think it will be worthwhile to provide (R<->C) API versions of median() and quantile() {with less options than the R functions, most probably!!},
such that we'd hopefully see less re-invention of the wheel happening in every package that needs such quantiles in its C code.

Biobase is in quite active maintenance, and I'd assume its maintainers will remove rowMedians() from there (or first replace it with a wrapper in order to deal with the namespace issue you mentioned) as soon as R has its own function with the same (or better) functionality. In order to facilitate the transition, we'd have to make sure that such a 'stats' function does behave " >= " to the bioBase one.

Martin



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 14 May 2007 - 12:33:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 May 2007 - 16:33:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.