Re: [Rd] Quiz: How to get a "named column" from a data frame

From: Andrew Piskorski <atp_at_piskorski.com>
Date: Sun, 19 Aug 2012 18:00:54 -0400

On Sat, Aug 18, 2012 at 02:13:20PM -0400, Christian Brechb?hler wrote:
> On 8/18/12, Martin Maechler <maechler@stat.math.ethz.ch> wrote:
> > On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler .... wrote:
> >> On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler
> >> <maechler_at_stat.math.ethz.ch> wrote:
>
> >>> Consider this toy example, where the dataframe already has only
> >>> one column :
> >>>
> >>> > nv <- c(a=1, d=17, e=101); nv
> >>> a d e
> >>> 1 17 101
> >>>
> >>> > df <- as.data.frame(cbind(VAR = nv)); df
> >>> VAR
> >>> a 1
> >>> d 17
> >>> e 101
> >>>
> >>> Now how, can I get 'nv' back from 'df' ? I.e., how to get

> >>> identical(nv, df[,1])
> >> [1] TRUE
>
> > But it is not a solution in a current version of R!
> > though it's still interesting that df[,1] worked in some incantation of
> > R.
>
> My mistake! We disliked some quirks of indexing, so we've long had
> our own patch for "[.data.frame" in place, which I used inadvertently.

As I understand it, when when doing 'df[,1]' on a data frame, Bell Labs S and all versions of S-Plus prior to 3.4 always retained the data frame's row names as the names on the result vector. 'df[,1]' gave you a named vector identical to your 'nv' above. Then in 1996 with S-Plus 3.4, Insightful broke that behavior, after which 'df[,1]' returned a vector without any names. I believe R copied that late-1990s S-Plus behavior, but I don't know why exactly.

When subscripting objects, R sometimes retains the object's dimnames as names in the result, and sometimes not, which I find frustrating. Personally, I think it would make much more sense if subscripting ALWAYS retained any names it could, and worked as similarly as possible across data frames, matrices, arrays, vectors, etc. After all, explicitly dropping names afterwards is trivial, while adding them back on is not.

Back on 2005-10-19 with R 2.2.0, I gave a simple test of 15 cases; 4 of them dropped names during subscripting, the other 11 preseved them. That's towards the end of the discussion here:

  https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=8192

Contrary to the initial tone of my old 2005 "bug" report, current R subscripting behavior is of course NOT a bug, as AFAIK it's working as the R Core Team intended. However, I definitely consider the current behavior a design infelicity.

Just now on stock R 2.15.1 (with --vanilla), I ran an updated version of those same simple tests. Of 22 subscripting test cases, 7 lose names and 15 preserve them. (If anyone's interested in the specific tests, I can send them, or try to append them to that old 8192 feature request.)

For what it's worth, at work, for years we ran various versions of pre-namespace R using some ugly patches of "[" and "[.data.frame" to force name retention during subscripting. Since we were not using namespaces at all, those "keep names" subscripting hacks were affecting ALL R code we ran, not just our own custom code which needed and expected the names to be retained. Yet perhaps surprisingly, I don't think I ever ran into a single case where the forced retention of names broke any code. We of course ran only a tiny sample of the huge amount of code on CRAN, but that experience suggests that most R code which expects un-named objects doesn't mind at all if names are present.

If anyone would genuinely like to add an option for name-preserving subscripting to R, I'm willing to work on it, so please do let me know your thoughts. So far though, I've never dug into the guts of the .Primitive("[") and "[.data.frame" functions to see how/why they sometimes keep and sometime discard names during subscripting.

-- 
Andrew Piskorski <atp_at_piskorski.com>

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 19 Aug 2012 - 22:03:18 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 20 Aug 2012 - 15:10:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive