Re: [R] Matching a vector with a matrix row

From: Ravi Varadhan <rvaradhan_at_jhmi.edu>
Date: Sun, 24 Apr 2011 13:46:20 -0400

I gave a solution previously with integer elements. It also works well for real numbers.

rowMatch <- function(A,B) {
# Rows in A that match the rows in B
# The row indexes correspond to A

    f <- function(...) paste(..., sep=":")

   if(!is.matrix(B)) B <- matrix(B, 1, length(B))
    a <- do.call("f", as.data.frame(A))
    b <- do.call("f", as.data.frame(B))

    match(b, a)
}

A <- matrix(rnorm(100000), 5000, 20)
sel <- sample(1:nrow(A), size=100, replace=TRUE) B <- A[sel,]

system.time(rows <- rowMatch(A, B ))
all.equal(sel, rows)

sel <- sample(1:nrow(A), size=1)
b <- c(A[sel,])
system.time(row <- rowMatch(A, b))
all.equal(sel, row)

I am curious to see if there are better/faster ways to do this.

Ravi.



From: r-help-bounces_at_r-project.org [r-help-bounces_at_r-project.org] On Behalf Of Petr Savicky [savicky_at_praha1.ff.cuni.cz] Sent: Sunday, April 24, 2011 5:13 AM
To: r-help_at_r-project.org
Subject: Re: [R] Matching a vector with a matrix row

On Sat, Apr 23, 2011 at 08:56:33AM +0800, Luis Felipe Parra wrote:
> Hello Niels, I am trying to find the rows in Matrix which contain all of the
> elements in LHS.

This sounds like you want an equivalent of

  all(LHS %in% x)

However, in your original post, you used

  all(x %in% LHS)

What is correct?

If the equality of x and LHS should be tested, then try

   setequal(x, LHS)

If the rows may contain repeated elements and the number of repetitions should also match, then try

  identical(sort(x), sort(LHS))

with a precomputed sort(LHS) for efficiency.

If the number of the different character values in the whole matrix is not too large, then efficiency of the comparison may be improved, if the matrix is converted to a matrix consisting of integer codes instead of the original character values. See ?factor for the meaning of "integer codes". After this conversion, the comparison can be done by comparing integers instead of character values, which is faster.

Hope this helps.

Petr Savicky.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 24 Apr 2011 - 17:49:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 24 Apr 2011 - 20:20:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive