Re: [Rd] duplicates() function

From: Hadley Wickham <hadley_at_rice.edu>
Date: Fri, 08 Apr 2011 10:13:26 -0500

On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <murdoch.duncan_at_gmail.com> wrote:
> I need a function which is similar to duplicated(), but instead of returning
> TRUE/FALSE, returns indices of which element was duplicated.  That is,
>
>> x <- c(9,7,9,3,7)
>> duplicated(x)
> [1] FALSE FALSE  TRUE FALSE TRUE
>
>> duplicates(x)
> [1] NA NA  1 NA  2
>
> (so that I know that element 3 is a duplicate of element 1, and element 5 is
> a duplicate of element 2, whereas the others were not duplicated according
> to our definition.)
>
> Is there a simple way to write this function?  I have  an ugly
> implementation in R that loops over all the values; it would make more sense
> to redo it in C, if there isn't a simple implementation I missed.

I'd think of making it a lookup table. The basic idea is

split(seq_along(x), x)

but there are probably much faster ways of doing it, depending on what you need. But for efficiency, you probably need a hashtable somewhere.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 08 Apr 2011 - 15:28:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Apr 2011 - 15:40:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive