Re: [Rd] (PR#8192) [ subscripting sometimes loses names

From: Christian Brechbühler <brechbuehler_at_gmail.com>
Date: Sun, 01 Feb 2009 21:12:27 -0500

Andy had written:

> >... The drop=FALSE argument has nothing to do with what
> >Christian was talking about. The kind of thing he meant is PR# 8192,
> >"Subject: [ subscripting sometimes loses names":
> >
> > http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>

On Sun, Feb 1, 2009 at 12:25 PM, Tim Hesterberg <TimHesterberg_at_gmail.com>wrote:

> (Later comments on the thread pointed out the difference between
> x[,1] for matrices and data frames.)
>
> I rewrote the S-PLUS data frame code around then, to fix
> various inconsistencies and improve efficiency.
> This was probably my change, and I would do it again.
>
> Note that the components of a data frame do not have names
> attached to them; the row names are a separate object.
> Extracting a component vector or matrix from a data frame should not
> attach names to the result, because of:
> * memory (attaching row names to an object can more than double the
> size of the object),
> * speed
> * some objects cannot take names, and attaching them could change
> the class and other behavior of an object, and
> * the names are usually/often (depending on the user) meaningless,
> artifacts of an early design decision that all data frames have row names.
>
> Data frames differ from matrices in two ways that matter here:
> * columns in matrices are all the same kind, and are simple objects
> (numeric, etc.), whereas components of data frames can be nearly
> arbitrary objects, and
> * row names get added to a data frame whether a user wants them or not,
> whereas row names on a matrix have to be specified.
>
> A historical note - unique row names on data frame were a design
> decision made when people worked with small data frames, and are
> convenient for small data frames. But they are a problem for large
> data frames. I was writing for all users, not just those with small
> data frames and meaningful names.
>

Hi Tim,

Thank you for explaning this so carefully. It's very valuable to hear the rationale beind a design decision explained so carefully. I accept that yours is the right solution for general use.

In our case, we deal with not too many rows, up to a few thousand, with meaningful names. And we mostly use data frames. Because of our special situation, we wrote our own "[" methods, which normally do what's right for us. That's why, in one debugging session, it was necessary to "get" the overriden, stock R method from package:base. In that case, the obejct happened to be a matrix not a dataframe, and R got a segmentation fault. And that's why I submitted the bug report that sparked this discussion.

/Christian

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 02 Feb 2009 - 02:14:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 02 Feb 2009 - 16:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive