Re: [Rd] duplicated() variation that goes both ways to capture all duplicates

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Mon, 23 Jul 2012 09:08:22 -0400

On 23/07/2012 8:49 AM, Liviu Andronic wrote:
> Dear all
> The trouble with the current duplicated() function in is that it can
> report duplicates while searching fromFirst _or_ fromLast, but not
> both ways. Often users will want to identify and extract all the
> copies of the item that has duplicates, not only the duplicates
> themselves.
>
> To take the example from the man page:
> > data(iris)
> > iris[duplicated(iris), ] ##duplicates while searching "fromFirst"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 143 5.8 2.7 5.1 1.9 virginica
> > iris[duplicated(iris, fromLast=T), ] ##duplicates while searching "fromLast"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
>
>
> To extract all the copies of the concerned items ("original" and
> duplicates) one would need to do something like this:
> > iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ] ##duplicates while searching "bothWays"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
> 143 5.8 2.7 5.1 1.9 virginica
>
>
> Unfortunately this is unnecessarily long and convoluted. Short of a
> 'bothWays' argument in duplicated(), I came up with a small wrapper
> that simplifies the above:
> duplicated2 <-
> function(x, bothWays=TRUE, ...)
> {
> if(!bothWays) {
> return(duplicated(x, ...))
> } else if(bothWays) {
> return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...)))
> }
> }
>
>
> Now the above can be achieved simply via:
> > iris[duplicated2(iris), ] ##duplicates while searching "bothWays"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
> 143 5.8 2.7 5.1 1.9 virginica
>
>
> So here's my inquiry: Would the R Core consider adding such
> functionality in 'base' R? Either the---suitably cleaned
> up---duplicated2() function above, or a "bothWays" argument in
> duplicated() itself? Either of the two would improve user convenience
> and reduce confusion. (In my case it took some time before I
> understood the correct approach to this problem.)

I can't speak for all of R core, but I don't see the need for this in base R -- your solution looks fine to me.

Duncan Murdoch



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 23 Jul 2012 - 13:10:45 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 23 Jul 2012 - 14:20:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive