# Re: [Rd] duplicates() function

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Mon, 11 Apr 2011 14:05:11 -0400

On 08/04/2011 11:39 AM, Joshua Ulrich wrote:
> On Fri, Apr 8, 2011 at 10:15 AM, Duncan Murdoch
> <murdoch.duncan_at_gmail.com> wrote:
> > On 08/04/2011 11:08 AM, Joshua Ulrich wrote:
> >>
> >>
> >> y<- rep(NA,length(x))
> >> y[duplicated(x)]<- match(x[duplicated(x)] ,x)
> >
> > That's a nice solution for vectors. Unfortunately for me, I have a matrix
> > (which duplicated() handles by checking whole rows). So a better example
> > that I should have posted would be
> >
> > x<- cbind(1, c(9,7,9,3,7) )
> >
> > and I'd still like the same output
> >
> For a matrix, could you apply the same strategy used in duplicated()?
>
> y<- rep(NA,NROW(x))
> temp<- apply(x, 1, function(x) paste(x, collapse="\r"))
> y[duplicated(temp)]<- match(temp[duplicated(temp)], temp)

Since this thread hasn't ended, I will say that I think this solution is the best I've seen for my specific problem. I was actually surprised that duplicated() did the string concatenation trick, but since it does, it makes a lot of sense to do the same in duplicates().

I think a good general purpose solution that worked wherever duplicated() works would likely be harder, because we don't really have the right primitives to make it work.

Duncan Murdoch
> >> duplicated(x)
> >
> > [1] FALSE FALSE TRUE FALSE TRUE
> >
> >> duplicates(x)
> >
> > [1] NA NA 1 NA 2
> >
> >
> > Duncan Murdoch
> >
> >> --
> >>
> >>
> >>
> >> On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch<murdoch.duncan_at_gmail.com>
> >> wrote:
> >> > I need a function which is similar to duplicated(), but instead of
> >> > returning
> >> > TRUE/FALSE, returns indices of which element was duplicated. That is,
> >> >
> >> >> x<- c(9,7,9,3,7)
> >> >> duplicated(x)
> >> > [1] FALSE FALSE TRUE FALSE TRUE
> >> >
> >> >> duplicates(x)
> >> > [1] NA NA 1 NA 2
> >> >
> >> > (so that I know that element 3 is a duplicate of element 1, and element
> >> > 5 is
> >> > a duplicate of element 2, whereas the others were not duplicated
> >> > according
> >> > to our definition.)
> >> >
> >> > Is there a simple way to write this function? I have an ugly
> >> > implementation in R that loops over all the values; it would make more
> >> > sense
> >> > to redo it in C, if there isn't a simple implementation I missed.
> >> >
> >> > Duncan Murdoch
> >> >
> >> > ______________________________________________
> >> > R-devel_at_r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >
> >

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 11 Apr 2011 - 18:08:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 12 Apr 2011 - 14:00:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.