Re: [Rd] (PR#8192) [ subscripting sometimes loses names

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Sat, 31 Jan 2009 18:33:11 -0500

On 31/01/2009 3:26 PM, Christian Brechbühler wrote:
> On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard
> <p.dalgaard_at_biostat.ku.dk>wrote:
>

>> Duncan Murdoch wrote:
>>
>>> On 31/01/2009 7:31 AM, Andrew Piskorski wrote:
>>>
>>>> On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
>>>>
>>>>> Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling
>>>>> [.data.frame
>>>>>
>>>>  ever tried drop=FALSE ?
>>>> Simon, no, the drop=FALSE argument has nothing to do with what
>>>> Christian was talking about.  The kind of thing he meant is PR# 8192,
>>>> "Subject: [ subscripting sometimes loses names":
>>>>
>>>>  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>>>>
>>> In that bug report you were asked to provide simple examples, and you
>>> didn't.
>>> ...
>>> I just tracked this one down, and can put together this simple example:
>>>
>>>  > (1:3)["no"]
>>> [1] NA
>>>
>>> where I think you would want the name "no" attached to the output.
>> No, it has nothing to do with indexing by name.  It's about preserving

> existing names when subsetting.

I think you misread my message.

>
> And the other two cases where you list "BAD" behaviour? I didn't track them

>>> down.
>>>
>> I did, and they boil down to variations of
>>
>>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>> [1] 1 2 3
>>
>> but it's not obvious that the result should be named using the row.names
>> and (in particular) whether or why it should differ from .....[[1]] and
>> ....$val. Given that for most purposes, extracting the relevant names would
>> just be unnecessary red tape, I'd say that we can do without it.

>
>
> Compare
>
>> data.frame(val=1:3,row.names=letters[1:3])[,1]

> [1] 1 2 3
>> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]

> a b c
> 1 2 3
>
> X[,1] preserves row names if X is a matrix, and loses them if X is a data
> frame. To me, this is ugly and inconsistent.
>
> One might argue that having names and dimnames at all is "red tape", and
> wastes memory and computational efficiency -- after all, Fortran arrays had
> no names. But R chose to drag along the names (sometimes), and it can be
> very helpful to us humans. Now R should do it consistently.

In one case you're working with a matrix, and in the other, a dataframe.   So perfect consistency is impossible: matrices and dataframes are not the same. So it's a matter of deciding how much consistency is worth pursuing. Now, it seems nobody thinks this is worth pursuing: so it won't get changed.

To get it changed, you should make the change, then investigate what would break the change were adopted, and what would become slower, etc.   Or convince someone else to do that. But the fact that you think it's ugly is probably not convincing.

Duncan Murdoch



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 31 Jan 2009 - 23:42:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 01 Feb 2009 - 17:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive