[Rd] non-duplicate names in data frames

From: Tim Hesterberg <TimHesterberg_at_gmail.com>
Date: Sun, 01 Feb 2009 09:29:43 -0800

I wrote on another thread
(with subject "[ subscripting sometimes loses names"):
>I like R's 'automatic' row names. This is a big help working with
>huge data frames (and I do this often, at Google). But this doesn't
>go far enough; subscripting and other operations sometimes convert the
>automatic names to real names, and check/enforce uniqueness, which is
>a big waste of time when working with large data frames. I'll comment
>more on this in a new thread.

I propose (and have begun writing, in my copious spare time):

* an optional argument to data.frame and other data frame creation code
* resulting in an attribute added to the data.frame
* so that subscripting and other operations on the data frame

My current thoughts, comments welcome:

Argument name and component name 'dup.row.names'

0 or FALSE or NULL - current, require unique names
1 or TRUE          - duplicates allowed (when subscripting etc.)
2                  - always automatic   (when subscripting etc.)

Option "maxRowNames", default say 10^4
Any data frames with more than this have dup.row.names default to 2.

The name 'dup.row.names' is for consistency with S+; there the options are NULL, F or T.

Tim Hesterberg



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 01 Feb 2009 - 17:37:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 01 Feb 2009 - 18:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive