From: Prasenjit Kapat <kapatp_at_gmail.com>

Date: Thu, 17 May 2007 19:56:12 -0400

}

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 May 2007 - 00:02:02 GMT

Date: Thu, 17 May 2007 19:56:12 -0400

Hi,

I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) such that, C[i,j] is the Euclidean distance between the ith row of A and jth row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). What is the best method for doing so? (assume a < b)

I have been doing some exploration myself: Consider the following function: get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' avoids one loop by constructing a bigger matrix, but doesn't use apply(); 'method=3' avoids both the loops by using apply() and constructing bigger matrices; 'method=4' avoids constructing bigger matrices by using apply() twice.

get.f <- function (A, B, method=2) {

if (method == 1){ a <- nrow(A); b <- nrow(B); C <- matrix(NA, nrow=a, ncol=b); for (i in 1:a) for (j in 1:b) C[i,j] <- sum((A[i,]-B[j,])^2)

} else if (method == 2 ) {

a <- nrow(A); b <- nrow(B); d <- ncol(A); C <- matrix(NA, nrow=a, ncol=b); for (i in 1:a) C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)

} else if (method == 3) {

C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose is needed

} else if (method == 4) {

C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B)) }

}

FUN1 <- function(aa, BB)

return(rowSums(

(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2) )

FUN2 <- function(aa, BB)

return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))

FUN3 <- function(bb,aa) return(sum((aa-bb)^2))

### With these methods and the following intitializations,

a <- 100; b <- 1000; d <- 100; n.loop <- 20;

A <- matrix(rnorm(a*d), ncol=d)

B <- matrix(rnorm(b*d), ncol=d)

all.times <- matrix(0,nrow=5,ncol=4)

rownames(all.times) <- rownames(as.matrix(system.time(NULL)))

for (i in 1:4)

for (j in 1:n.loop) all.times[,i] <- all.times[,i] + as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))

all.times <- all.times / n.loop

print(all.times)

[,1] [,2] [,3] [,4] user.self 4.0554 1.50010 1.50130 4.51285 sys.self 0.0370 0.02420 0.01800 0.04260 elapsed 4.2705 1.58865 1.59475 6.07535 user.child 0.0000 0.00000 0.00000 0.00000 sys.child 0.0000 0.00000 0.00000 0.00000

'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' (two apply()s)... Is that expected?

Is it possible to improve over 'method=2'?

Thanks

PK

PS: The mail text seems fine in my composer, I hope, it looks decent in your reader.

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 May 2007 - 00:02:02 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 18 May 2007 - 05:31:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*