[R] replacing a for-loop with lapply

From: Daniel Berg <daniel_at_nr.no>
Date: Tue 10 May 2005 - 02:31:50 EST


Dear All,

I am trying to compute a goodness-of-fit statistic for a copula, based on an empirical density estimate of this copula. To do this I can use the following code:

> n <- dim(data)[1]
> d <- dim(data)[2]
> Chat <- rep(0,n)
> for(i in 1:n)

+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)

However, I have a feeling this can be done more effectively than using a for-loop. I have also tried the following:

> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))

but there is no improvement. I ran the following timing test:

> data <- matrix(runif(300),100,3)
> n = dim(data)[1]
> d = dim(data)[2]
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){

+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}

> b <- rep(0,M)
> for(m in 1:30){
+ b[m] <- system.time(	
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}

> summary(a)
> summary(b)

and the output was:

> summary(a)

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8500 0.8700 0.8900 0.9013 0.9300 0.9800
> summary(b)

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8400 0.8600 0.8800 0.8883 0.9075 0.9900

Is there any way I can code this more efficiently in R or will I have to turn to C? The data sets, on which I am actually going to run this code, will be of sizes up to (5000x100) and I need hundreds of realizations...

Thank you for your time.

Rgds,
Daniel



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 10 02:43:09 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:40 EST