Re: [Rd] setdiff bizarre

From: William Dunlap <wdunlap_at_tibco.com>
Date: Tue, 02 Jun 2009 10:18:23 -0700

%in% is a thin wrapper on a call to match(). match() is not a generic function (and is not documented to be one), so it treats data.frames as lists, as their underlying representation is a list of columns. match is documented to convert lists to character and to then run the character version of match on that character data. match does not bail out if the types of the x and table arguments don't match (that would be undesirable in the integer/numeric mismatch case). Hence

   '1' %in% data.frame(1) # -> TRUE
is acting consistently with

   match(as.character(pi), c(1, pi, exp(1))) # -> 2 and

   1L %in% c(1.0, 2.0, 3.0) # -> TRUE

The related functions, duplicated() and unique(), do have row-wise data.frame methods. E.g.,

> duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
   [1] FALSE FALSE FALSE FALSE TRUE

Perhaps match() ought to have one also. S+'s match is generic and has a data.frame method (which is row-oriented) so there we get:

> match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
data.frame(x=1:10,y=letters[1:10]))

   [1] 1 3 5
> is.element(data.frame(x=1:10,y=letters[1:10]),
data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))

    [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE I think that %in% and is.element() ought to remain calls to match() and that if you want them to work row-wise on data.frames then match should get a data.frame method.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

> -----Original Message-----
> From: r-devel-bounces_at_r-project.org 
> [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Wacek Kusnierczyk
> Sent: Tuesday, June 02, 2009 9:11 AM
> To: Stavros Macrakis
> Cc: r-devel_at_r-project.org; dwinsemius_at_comcast.net
> Subject: Re: [Rd] setdiff bizarre
> 
> Stavros Macrakis wrote:
> >
> >      '1:3' %in% data.frame(a=2:4,b=1:3)  # TRUE
> >   
> 
> utterly weird.  so what would x have to be so that
> 
>     x %in% data.frame('a')
>     # TRUE
> 
> hint: 
> 
>     '1' %in% data.frame(1)
>     # TRUE
> 
> vQ
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 02 Jun 2009 - 17:23:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 02 Jun 2009 - 22:34:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive