RE: [R] Tagging identical rows of a matrix

About this list Date view Thread view Subject view Author view Attachment view

From: Waichler, Scott R (Scott.Waichler@pnl.gov)
Date: Sat 15 May 2004 - 06:12:08 EST


Message-id: <62AE0CF1D4875C4BBDEC29DB9924ACE87F21DB@pnlmse25.pnl.gov>


Thanks to all of you who responded to my help request.
Here is the very efficient upshot of your advice:

> mat2 <- apply(mat, 1, paste, collapse=":")
> vec <- match(mat2, unique(mat2))
> vec
[1] 1 2 1 1 2 3

P.S. I found that Andy Liaw's method didn't preserve the
index order that I wanted; it yields

2 3 2 2 3 1

To get the order of integers I was looking for required an
invocation of unique:

as.numeric(factor(apply(mat, 1, paste, collapse=":"),
                  levels=unique(apply(mat, 1, paste, collapse=":"))))

But the first method above is obviously cleaner and is twice
as fast, only 9 seconds for a 100000 row matrix on an ordinary PC.

Regards,
Scott Waichler

> > I would like to generate a vector having the same length
> > as the number of rows in a matrix. The vector should contain an
> > integer indicating the "group" of the row, where identical
> matrix rows
> > are in a group, and a unique row has a unique integer. Thus, for
> >
> > a <- c(1,2)
> > b <- c(1,3)
> > c <- c(1,2)
> > d <- c(1,2)
> > e <- c(1,3)
> > f <- c(2,1)
> > mat <- rbind(a,b,c,d,e,f)
> >
> > I would like to get the vector c(1,2,1,1,2,3). I know dist() gives
> > part of the answer, but I can't figure out how to use it for this
> > purpose without doing a lot of looping. I need to apply this to
> > matrices up to ~100000 rows.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:11 EST