Re: [Rd] Unexpected alteration of data frame column names

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Tue, 15 May 2007 13:25:39 -0500

On Mon, 2007-05-14 at 23:59 -0700, Herve Pages wrote:
> Hi,
>
> I'm using data.frame(..., check.names=FALSE), because I want to create
> a data frame with duplicated column names (in the real life you can get such
> data frame as the result of an SQL query):
>
> > df <- data.frame(aa=1:5, aa=9:5, check.names=FALSE)
> > df
> aa aa
> 1 1 9
> 2 2 8
> 3 3 7
> 4 4 6
> 5 5 5
>
> Why is [.data.frame changing my column names?
>
> > df[1:3, ]
> aa aa.1
> 1 1 9
> 2 2 8
> 3 3 7
>
> How can this be avoided? Thanks!
>
> H.

Herve,

I had not seen a reply to your post, but you can review the code for "[.data.frame" by using:

  getAnywhere("[.data.frame")

and see where there are checks for duplicate column names in the function.

That is going to be the default behavior for data frame subsetting/extraction and in fact is noted in the 'ONEWS' file for R version 1.8.0:

So it has been around for some time (October of 2003).

In terms of avoiding it, I suspect that you would have to create your own version of the function, perhaps with an additional argument that enables/disables that duplicate column name checks.

I have not however considered the broader functional implications of doing so however, so be vewwy vewwy careful here.

HTH, Marc Schwartz



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 15 May 2007 - 18:32:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 May 2007 - 05:34:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.