Re: [Rd] merge bug fix in R 2.15.0

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sun, 25 Mar 2012 15:34:03 +0100 (BST)

On Mon, 19 Mar 2012, Stephanie M. Gogarten wrote:

> I would like to add a vote for keeping blank suffixes in merge(), as I
> routinely use this functionality. An example use case:

But you don't have a vote ....

An exception has been made for "" in R 2.15.0. However, further cases of unintended results resulting from duplicate names using merge() have come to light, so there will be further restrictions imposed in future to protect users from failing to consider duplicate names.

>
> # using R 2.14.1
> # d1 is some data that I've been working on for a while
> d1 <- data.frame(a=letters[1:10], b=1:10)
> # d2 is some new data from a collaborator. I want to add one of these #
> columns to d1, and also check that the existing columns are consistent
> d2 <- data.frame(a=letters[1:10], b=1:10, c=101:110)
>
> # use blank suffix to avoid changing the column names of my
> # original data frame
> d3 <- merge(d1, d2, by="a", suffixes=c("", ".new"))
> all(d3$b == d3$b.new)
> # if this is FALSE, time to email collaborator
> d3$b.new <- NULL
>
> In real usage d1 would have many more columns than d2, so adding suffixes to
> d1 would be tedious to undo after the merge.
>
> Stephanie Gogarten
> Research Scientist, Biostatistics
> University of Washington
>
> On 3/19/12 4:00 AM, r-devel-request_at_r-project.org wrote:
>> Message: 12
>> Date: Sun, 18 Mar 2012 15:48:30 -0400
>> From: Steve Lianoglou<mailinglist.honeypot_at_gmail.com>
>> To: Uwe Ligges<ligges_at_statistik.tu-dortmund.de>
>> Cc: Matthew Dowle<mdowle_at_mdowle.plus.com>,r-devel_at_r-project.org
>> Subject: Re: [Rd] merge bug fix in R 2.15.0
>> Message-ID:
>> <CAHA9McMGy0U9B_8x=RSBfjCCUMsuEhUUxB03wdtfTRBGAFsJtA@mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi Uwe,
>>
>> 2012/3/17 Uwe Ligges<ligges_at_statistik.tu-dortmund.de>:
>>> >
>>> >
>>> > On 15.03.2012 22:48, Matthew Dowle wrote:
>>>> >>
>>>> >>
>>>> >> Anyone?
>>>> >>
>>>>> >>> Is it intended that the first suffix can no longer be blank? Seems
>>>>> to be
>>>>> >>> caused by a bug fix to merge in R 2.15.0.
>>> >
>>> >
>>> >
>>> > Right, the user is now protected against confusing himself by using
>>> names
>>> > that were not unique before the merge.
>> ... now I'm confused:-)
>>
>> If the user explicitly asks for a NULL/0/empty/whatever suffix,
>> they're not really going to be confusing themselves, right?
>>
>> I actually feel like I do this often, where "this" is explicitly
>> asking to not add a suffix to one group of columns ... I do confuse
>> myself every and now and again, but not in this context, yet.
>>
>> I can see that*this* confusing case is now handled w/ this change
>> (which wasn't before):
>>
>> ## I'm using R-devel compiled back in November, 2011 (r57571)
>> R> d1<- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
>> R> d2<- data.frame(a=letters[1:10], b=101:110)
>> R> merge(d1, d2, by='a', suffixes=c('.x', '.y'))
>> a b.x b.x b.y
>> 1 a -1.52250626 q 101
>> 2 b -0.99865341 r 102
>> ... ## Let's call this "exhibit A"
>>
>> But if I do this:
>> R> merge(d1, d2, by='a', suffixes=c("", ".y"))
>>
>> I totally expect:
>>
>> a b b.x b.y
>> 1 a -1.52250626 q 101
>> 2 b -0.99865341 r 102
>> ## Let's call this "exhibit B"
>> ...
>>
>> and not (using R-2.15.0 beta) (exhibit B):
>>
>> Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) :
>> there is already a column named 'b'
>>
>> I can take a crack at a patch to keep the "rescue user from surprises"
>> example outlined in "exhibit A," but also letting user accomplish
>> "exhibit B" if there is a consensus of agreement on this particular
>> world view.
>>
>> -steve
>>
>> -- Steve Lianoglou Graduate Student: Computational Systems Biology ?|
>> Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of
>> Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 25 Mar 2012 - 14:41:04 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 25 Mar 2012 - 19:50:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive