Re: [Rd] merge bug fix in R 2.15.0

From: Peter Meilstrup <peter.meilstrup_at_gmail.com>
Date: Sun, 18 Mar 2012 13:40:46 -0700

On Sun, Mar 18, 2012 at 12:48 PM, Steve Lianoglou < mailinglist.honeypot_at_gmail.com> wrote:

> Hi Uwe,
>
> 2012/3/17 Uwe Ligges <ligges_at_statistik.tu-dortmund.de>:
> >
> >
> > On 15.03.2012 22:48, Matthew Dowle wrote:
> >>
> >>
> >> Anyone?
> >>
> >>> Is it intended that the first suffix can no longer be blank? Seems to
> be
> >>> caused by a bug fix to merge in R 2.15.0.
> >
> >
> >
> > Right, the user is now protected against confusing himself by using names
> > that were not unique before the merge.
>
> ... now I'm confused :-)
>
> If the user explicitly asks for a NULL/0/empty/whatever suffix,
> they're not really going to be confusing themselves, right?
>

If the user asks for a blank suffix and you still give back ".x" or ".y"  as a suffix, then yes that is confusing.

> I actually feel like I do this often, where "this" is explicitly
> asking to not add a suffix to one group of columns ... I do confuse
> myself every and now and again, but not in this context, yet.
>
> I can see that *this* confusing case is now handled w/ this change
> (which wasn't before):
>
> ## I'm using R-devel compiled back in November, 2011 (r57571)
> R> d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
> R> d2 <- data.frame(a=letters[1:10], b=101:110)
> R> merge(d1, d2, by='a', suffixes=c('.x', '.y'))
> a b.x b.x b.y
> 1 a -1.52250626 q 101
> 2 b -0.99865341 r 102
> ... ## Let's call this "exhibit A"
>
> But if I do this:
> R> merge(d1, d2, by='a', suffixes=c("", ".y"))
>
> I totally expect:
>
> a b b.x b.y
> 1 a -1.52250626 q 101
> 2 b -0.99865341 r 102
> ## Let's call this "exhibit B"

...
>
> and not (using R-2.15.0 beta) (exhibit B):
>
> Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) :
> there is already a column named 'b'
>

As a user I would expect that the rule for column names produced by "merge" would be simple: the output column name is the concatenation of the input column name and the corresponding suffix. When I use 'merge" I don't expect a more complicated behavior that somehow still uses '.x' even though I asked it not to, as in your second example. So I would say that the new behavior is more consistent.

When I write functions that use "merge" on general data frames, I can anticipate and use the simpler rule, but it is difficult to anticipate the results of the more complicated rule in a way that my subsequent lines of code will work.

If the inputs I give to merge are inconsistent with the simple rule I would much rather have an exception (highlighting exactly where my code has gone wrong) than a surprising column name change (which makes my code mysteriously fail ten or a hundred lines later).

Peter

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 18 Mar 2012 - 22:13:12 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 19 Mar 2012 - 02:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive