RE: [R] replacing a for-loop with lapply

From: Huntsinger, Reid <reid_huntsinger_at_merck.com>
Date: Tue 10 May 2005 - 03:22:31 EST


I suggest

  1. Transpose "data" once at the beginning.
  2. Replace "apply" with "colSums" to find cols with sum = d. Since you have logical values, the sum count the number of TRUES and you want them all TRUE, it looks to me.

With further work you could vectorize this, but loops in R are actually pretty good once you can streamline the code inside.

I get

> system.time(for(i in 1:n) Chat[i] <-

sum(apply(t(data)<=data[i,],2,prod))/(n+1)) [1] 0.62 0.01 0.73 NA NA

while with

> tdata <- t(data)

I get much improved

> system.time(for(i in 1:n) Chat[i] <- sum(colSums(tdata <= tdata[,i]) ==
d)/(n+1))
[1] 0.04 0.00 0.04 NA NA

Reid Huntsinger

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Daniel Berg Sent: Monday, May 09, 2005 12:32 PM
To: r-help@stat.math.ethz.ch
Subject: [R] replacing a for-loop with lapply

Dear All,

I am trying to compute a goodness-of-fit statistic for a copula, based on an empirical density estimate of this copula. To do this I can use the following code:

> n <- dim(data)[1]
> d <- dim(data)[2]
> Chat <- rep(0,n)
> for(i in 1:n)
+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)

However, I have a feeling this can be done more effectively than using a for-loop. I have also tried the following:

> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))

but there is no improvement. I ran the following timing test:

> data <- matrix(runif(300),100,3)
> n = dim(data)[1]
> d = dim(data)[2]
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){

+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}

> b <- rep(0,M)
> for(m in 1:30){
+ b[m] <- system.time(	
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}

> summary(a)
> summary(b)

and the output was:

> summary(a)

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8500 0.8700 0.8900 0.9013 0.9300 0.9800
> summary(b)

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8400 0.8600 0.8800 0.8883 0.9075 0.9900

Is there any way I can code this more efficiently in R or will I have to turn to C? The data sets, on which I am actually going to run this code, will be of sizes up to (5000x100) and I need hundreds of realizations...

Thank you for your time.

Rgds,
Daniel



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 10 03:26:50 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:40 EST