Re: [Rd] setdiff bizarre

From: Wacek Kusnierczyk <>
Date: Tue, 02 Jun 2009 20:03:36 +0200

William Dunlap wrote:
> %in% is a thin wrapper on a call to match(). match() is
> not a generic function (and is not documented to be one),
> so it treats data.frames as lists, as their underlying
> representation is a list of columns. match is documented

> to convert lists to character and to then run the character
> version of match on that character data. match does not
> bail out if the types of the x and table arguments don't match
> (that would be undesirable in the integer/numeric mismatch case).

yes, i understand that this is documented behaviour, and that it's not a bug. nevertheless, the example is odd, and hints that there's a design flaw. i also do not understand why the following should be useful and desirable:

    # "a"

    # "1"

and hence

    'a' %in% list('a')
    # TRUE while

    'a' %in% data.frame('a')
    # FALSE
    '1' %in% data.frame('a')
    # TRUE there is a mechanistic explanation for how this works, but is there one for why this works this way?

> Hence
> '1' %in% data.frame(1) # -> TRUE
> is acting consistently with
> match(as.character(pi), c(1, pi, exp(1))) # -> 2
> and
> 1L %in% c(1.0, 2.0, 3.0) # -> TRUE


> The related functions, duplicated() and unique(), do have
> row-wise data.frame methods. E.g.,
> > duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
> Perhaps match() ought to have one also. S+'s match is generic
> and has a data.frame method (which is row-oriented) so there we get:
> > match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
> data.frame(x=1:10,y=letters[1:10]))
> [1] 1 3 5
> > is.element(data.frame(x=1:10,y=letters[1:10]),
> data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))

> I think that %in% and is.element() ought to remain calls to match()
> and that if you want them to work row-wise on data.frames then
> match should get a data.frame method.

sounds good to me. how is

    'a' %in% data.frame('a')

in S+?

thanks for the response.

vQ mailing list Received on Tue 02 Jun 2009 - 18:07:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 02 Jun 2009 - 19:34:49 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive