[R] use loop or use apply?

From: Prasenjit Kapat <kapatp_at_gmail.com>
Date: Thu, 17 May 2007 19:56:12 -0400


Hi,

I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) such that, C[i,j] is the Euclidean distance between the ith row of A and jth row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). What is the best method for doing so? (assume a < b)

I have been doing some exploration myself: Consider the following function: get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' avoids one loop by constructing a bigger matrix, but doesn't use apply(); 'method=3' avoids both the loops by using apply() and constructing bigger matrices; 'method=4' avoids constructing bigger matrices by using apply() twice.

get.f <- function (A, B, method=2) {

	if (method == 1){
		a <- nrow(A); b <- nrow(B);
		C <- matrix(NA, nrow=a, ncol=b);
		for (i in 1:a) 
			for (j in 1:b) 
				C[i,j] <- sum((A[i,]-B[j,])^2)

} else if (method == 2 ) {
a <- nrow(A); b <- nrow(B); d <- ncol(A); C <- matrix(NA, nrow=a, ncol=b); for (i in 1:a) C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
} else if (method == 3) {
C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose is needed
} else if (method == 4) {
C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B)) }

}

FUN1 <- function(aa, BB)
  return(rowSums(

                (matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)   )

FUN2 <- function(aa, BB)

        return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))

FUN3 <- function(bb,aa) return(sum((aa-bb)^2))

### With these methods and the following intitializations,

a <- 100; b <- 1000; d <- 100; n.loop <- 20;

A <- matrix(rnorm(a*d), ncol=d)
B <- matrix(rnorm(b*d), ncol=d)

all.times <- matrix(0,nrow=5,ncol=4)
rownames(all.times) <- rownames(as.matrix(system.time(NULL)))

for (i in 1:4)

	for (j in 1:n.loop)
		all.times[,i] <- all.times[,i] + 
				as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))

all.times <- all.times / n.loop
print(all.times)

               [,1] [,2] [,3] [,4] user.self 4.0554 1.50010 1.50130 4.51285 sys.self 0.0370 0.02420 0.01800 0.04260 elapsed 4.2705 1.58865 1.59475 6.07535 user.child 0.0000 0.00000 0.00000 0.00000 sys.child 0.0000 0.00000 0.00000 0.00000

'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' (two apply()s)... Is that expected?

Is it possible to improve over 'method=2'?

Thanks
PK

PS: The mail text seems fine in my composer, I hope, it looks decent in your reader.



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 May 2007 - 00:02:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 18 May 2007 - 05:31:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.