Re: [Rd] setdiff bizarre (was: odd behavior out of setdiff)

From: Stavros Macrakis <>
Date: Tue, 02 Jun 2009 11:13:48 -0400

On Sat, May 30, 2009 at 11:59 AM, Stavros Macrakis <>wrote:

> Since R is object-oriented, data frame set operations should be the natural
> operations for their class. There are, I suppose, two natural ways: the
> column-wise (variable-wise) and the row-wise (observation-wise) one. The
> row-wise one seems more natural and more useful to me.
> ...
> The row-wise interpretation makes sense in cases where observations with
> the same values for all variables can be considered redundant. That seems
> to me a much more useful interpretation. The union, intersection, and set
> difference of two sets of observations would seem to all be highly useful.

Another argument for the row-wise interpretation: the `subset` function (also part of base) works that way on data frames.

Interestingly, %in%/match appears to work neither row-wise nor column-wise:

     1 %in% data.frame(a=1:3)  # FALSE  (would be true if row-wise)
     1:3 %in% data.frame(a=1:3) # FALSE FALSE FALSE (would be true if

but simply treats the data frame as a *character* list:

     1 %in% data.frame(a=2,b=1)  # TRUE
     '1' %in% data.frame(a=2,b=1)  # TRUE
     1 %in% data.frame(a=2:3,b=1:2) # FALSE
     1:3 %in% data.frame(a=2:4,b=1:3)  # FALSE FALSE FALSE
     '1:3' %in% data.frame(a=2:4,b=1:3)  # TRUE

This specification is clearly documented in ? match, but I am mystified by it. Perhaps someone from R core can shed light on the rationale?


        [[alternative HTML version deleted]] mailing list Received on Tue 02 Jun 2009 - 15:15:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 02 Jun 2009 - 18:34:37 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive