Re: [Rd] duplicates() function

From: Hadley Wickham <>
Date: Fri, 08 Apr 2011 10:13:26 -0500

On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <> wrote:
> I need a function which is similar to duplicated(), but instead of returning
> TRUE/FALSE, returns indices of which element was duplicated.  That is,
>> x <- c(9,7,9,3,7)
>> duplicated(x)
>> duplicates(x)
> [1] NA NA  1 NA  2
> (so that I know that element 3 is a duplicate of element 1, and element 5 is
> a duplicate of element 2, whereas the others were not duplicated according
> to our definition.)
> Is there a simple way to write this function?  I have  an ugly
> implementation in R that loops over all the values; it would make more sense
> to redo it in C, if there isn't a simple implementation I missed.

I'd think of making it a lookup table. The basic idea is

split(seq_along(x), x)

but there are probably much faster ways of doing it, depending on what you need. But for efficiency, you probably need a hashtable somewhere.


Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University

______________________________________________ mailing list
Received on Fri 08 Apr 2011 - 15:28:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Apr 2011 - 15:40:43 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive