Re: [Rd] median and data frames

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Sat, 08 Oct 2011 20:15:38 +0200

>>>>> Martin Maechler <maechler_at_stat.math.ethz.ch> >>>>> on Fri, 29 Apr 2011 16:25:09 +0200 writes:

>>>>> Paul Johnson <pauljohn32_at_gmail.com> >>>>> on Thu, 28 Apr 2011 00:20:27 -0500 writes:

    >> On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
    >> <pburns_at_pburns.seanet.com> wrote:

>>> Here are some data frames:
>>>
>>> df3.2 <- data.frame(1:3, 7:9) df4.2 <- data.frame(1:4,
>>> 7:10) df3.3 <- data.frame(1:3, 7:9, 10:12) df4.3 <-
>>> data.frame(1:4, 7:10, 10:13) df3.4 <- data.frame(1:3,
>>> 7:9, 10:12, 15:17) df4.4 <- data.frame(1:4, 7:10, 10:13,
>>> 15:18)
>>>
>>> Now here are some commands and their answers:
    >>>> median(df4.4)

>>> [1]  8.5 11.5
    >>>> median(df3.2[c(1,2,3),])

>>> [1] 2 8
    >>>> median(df3.2[c(1,3,2),])

>>> [1]  2 NA Warning message: In mean.default(X[[2L]], ...)
>>> :  argument is not numeric or logical: returning NA
>>>
>>>
>>>
>>> The sessionInfo is below, but it looks to me like the
>>> present behavior started in 2.10.0.
>>>
>>> Sometimes it gets the right answer.  I'd be grateful to
>>> hear how it does that -- I can't figure it out.
>>>

    > Hello, Pat.

    >> Nice poetry there! I think I have an actual answer, as     >> opposed to the usual crap I spew.

    >> I would agree if you said median.data.frame ought to be     >> written to work columnwise, similar to mean.data.frame.

    >> apply and sapply always give the correct answer

>>> apply(df3.3, 2, median)

    >> X1.3 X7.9 X10.12 2 8 11

    > [...........]

    > exactly

    >> mean.data.frame is now implemented as

    >> mean.data.frame <- function(x, ...) sapply(x, mean, ...)

    > exactly.

    > My personal oppinion is that mean.data.frame() should
    > never have been written.  People should know, or learn, to
    > use apply functions for such a task.

    > The unfortunate fact that mean.data.frame() exists makes     > people think that median.data.frame() should too, and then

    > var.data.frame() sd.data.frame() mad.data.frame()     > min.data.frame() max.data.frame() ... ...

    > all just in order to *not* to have to know sapply() ????

    > No, rather not.

    > My vote is for deprecating mean.data.frame().     > Martin

This has now happened -- for R 2.14.0 and later. As raised in this thread in April, there's a similar "extra helpful" behavior within the sd() function, and we've also deprecated that.

In addition -- getting back to Pat Burns' original post, I'm also proposing to change median(<data.frame>) such that it produces an error instead of the current "sometimes correct" (but mostly not!) results.

Martin



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 08 Oct 2011 - 18:17:23 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 10 Oct 2011 - 07:40:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive