Re: [Rd] median and data frames

From: Tim Hesterberg <timhesterberg_at_gmail.com>
Date: Sat, 30 Apr 2011 08:19:31 -0700

I also favor deprecating mean.data.frame.

One possible exception would be for a single-column data frame. But even here I'd say no, lest people expect the same behavior for median, var, ...

Pat's suggestion of using stop() would work nicely for mean. (but omit paste - stop handles that).

Tim Hesterberg

>If Martin's proposal is accepted, does
>that mean that the median method for
>data frames would be something like:
>
>function (x, ...)
>{
> stop(paste("you probably mean to use the command: sapply(",
> deparse(substitute(x)), ", median)", sep=""))
>}
>
>Pat
>
>
>On 29/04/2011 15:25, Martin Maechler wrote:
>>>>>>> Paul Johnson<pauljohn32_at_gmail.com>
>>>>>>> on Thu, 28 Apr 2011 00:20:27 -0500 writes:
>>
>> > On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
>> > <pburns_at_pburns.seanet.com> wrote:
>> >> Here are some data frames:
>> >>
>> >> df3.2<- data.frame(1:3, 7:9)
>> >> df4.2<- data.frame(1:4, 7:10)
>> >> df3.3<- data.frame(1:3, 7:9, 10:12)
>> >> df4.3<- data.frame(1:4, 7:10, 10:13)
>> >> df3.4<- data.frame(1:3, 7:9, 10:12, 15:17)
>> >> df4.4<- data.frame(1:4, 7:10, 10:13, 15:18)
>> >>
>> >> Now here are some commands and their answers:
>>
>> >>> median(df4.4)
>> >> [1] 8.5 11.5
>> >>> median(df3.2[c(1,2,3),])
>> >> [1] 2 8
>> >>> median(df3.2[c(1,3,2),])
>> >> [1] 2 NA
>> >> Warning message:
>> >> In mean.default(X[[2L]], ...) :
>> >> argument is not numeric or logical: returning NA
>> >>
>> >>
>> >>
>> >> The sessionInfo is below, but it looks
>> >> to me like the present behavior started
>> >> in 2.10.0.
>> >>
>> >> Sometimes it gets the right answer. I'd
>> >> be grateful to hear how it does that -- I
>> >> can't figure it out.
>> >>
>>
>> > Hello, Pat.
>>
>> > Nice poetry there! I think I have an actual answer, as opposed to the
>> > usual crap I spew.
>>
>> > I would agree if you said median.data.frame ought to be written to
>> > work columnwise, similar to mean.data.frame.
>>
>> > apply and sapply always give the correct answer
>>
>> >> apply(df3.3, 2, median)
>> > X1.3 X7.9 X10.12
>> > 2 8 11
>>
>> [...........]
>>
>> exactly
>>
>> > mean.data.frame is now implemented as
>>
>> > mean.data.frame<- function(x, ...) sapply(x, mean, ...)
>>
>> exactly.
>>
>> My personal oppinion is that mean.data.frame() should never have
>> been written.
>> People should know, or learn, to use apply functions for such a
>> task.
>>
>> The unfortunate fact that mean.data.frame() exists makes people
>> think that median.data.frame() should too,
>> and then
>>
>> var.data.frame()
>> sd.data.frame()
>> mad.data.frame()
>> min.data.frame()
>> max.data.frame()
>> ...
>> ...
>>
>> all just in order to *not* to have to know sapply()
>> ????
>>
>> No, rather not.
>>
>> My vote is for deprecating mean.data.frame().
>>
>> Martin
>>
>
>--
>Patrick Burns
>pburns_at_pburns.seanet.com
>twitter: @portfolioprobe
>http://www.portfolioprobe.com/blog
>http://www.burns-stat.com
>(home of 'Some hints for the R beginner'
>and 'The R Inferno')



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 30 Apr 2011 - 15:25:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 30 Apr 2011 - 17:00:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive