# Re: [R] Tagging identical rows of a matrix

From: Gabor Grothendieck (ggrothendieck@myway.com)
Date: Sat 15 May 2004 - 07:03:45 EST

```Message-id: <loom.20040514T222448-769@post.gmane.org>

```

Waichler, Scott R <Scott.Waichler <at> pnl.gov> writes:

>
> Thanks to all of you who responded to my help request.
> Here is the very efficient upshot of your advice:
>
> > mat2 <- apply(mat, 1, paste, collapse=":")
> > vec <- match(mat2, unique(mat2))
> > vec
> [1] 1 2 1 1 2 3
>
>
> P.S. I found that Andy Liaw's method didn't preserve the
> index order that I wanted; it yields
>
> 2 3 2 2 3 1
>
> To get the order of integers I was looking for required an
> invocation of unique:
>
> as.numeric(factor(apply(mat, 1, paste, collapse=":"),
> levels=unique(apply(mat, 1, paste, collapse=":"))))
>
> But the first method above is obviously cleaner and is twice
> as fast, only 9 seconds for a 100000 row matrix on an ordinary PC.

The interaction solution gives an identical result, is shorter and
is one or two orders of magnitude faster. Here is a comparison of the three:

R> set.seed(1)
R> mat <- matrix(sample(20,100000,rep=T),50000)
R>
R> f0 <- function(mat) {
+ mat2 <- apply(mat, 1, paste, collapse=":");
+ match(mat2, unique(mat2))
+ }
R>
R>
R> f1 <- function(mat) { z <- apply(mat, 1, paste, collapse=":")
+ as.numeric(factor(z,levels=unique(z)))
+ }
R>
R> f2 <- function(mat) as.numeric(interaction(mat[,1],mat[,2],drop=T))
R>
R> dummy <- gc(); system.time(z0 <- f0(mat))
[1] 5.24 0.02 5.52 NA NA
R> dummy <- gc(); system.time(z1 <- f1(mat))
[1] 5.18 0.00 5.52 NA NA
R> dummy <- gc(); system.time(z2 <- f2(mat))
[1] 0.1 0.0 0.1 NA NA
R> all.equal(z0,z1)
[1] TRUE
R> all.equal(z0,z2)
[1] TRUE
R> all.equal(z2,z1)
[1] TRUE

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:11 EST