# RE: [R] replacing a for-loop with lapply

From: Huntsinger, Reid <reid_huntsinger_at_merck.com>
Date: Tue 10 May 2005 - 03:22:31 EST

I suggest

1. Transpose "data" once at the beginning.
2. Replace "apply" with "colSums" to find cols with sum = d. Since you have logical values, the sum count the number of TRUES and you want them all TRUE, it looks to me.

With further work you could vectorize this, but loops in R are actually pretty good once you can streamline the code inside.

I get

> system.time(for(i in 1:n) Chat[i] <-

sum(apply(t(data)<=data[i,],2,prod))/(n+1))  0.62 0.01 0.73 NA NA

while with

> tdata <- t(data)

I get much improved

> system.time(for(i in 1:n) Chat[i] <- sum(colSums(tdata <= tdata[,i]) ==
d)/(n+1))
 0.04 0.00 0.04 NA NA

Reid Huntsinger

Dear All,

I am trying to compute a goodness-of-fit statistic for a copula, based on an empirical density estimate of this copula. To do this I can use the following code:

> n <- dim(data)
> d <- dim(data)
> Chat <- rep(0,n)
> for(i in 1:n)
+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)

However, I have a feeling this can be done more effectively than using a for-loop. I have also tried the following:

> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))

but there is no improvement. I ran the following timing test:

> data <- matrix(runif(300),100,3)
> n = dim(data)
> d = dim(data)
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){

```+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})}
```

> b <- rep(0,M)
> for(m in 1:30){
```+ b[m] <- system.time(
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))}
```

> summary(a)
> summary(b)

and the output was:

> summary(a)

Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8500 0.8700 0.8900 0.9013 0.9300 0.9800
> summary(b)

Min. 1st Qu. Median Mean 3rd Qu. Max.  0.8400 0.8600 0.8800 0.8883 0.9075 0.9900

Is there any way I can code this more efficiently in R or will I have to turn to C? The data sets, on which I am actually going to run this code, will be of sizes up to (5000x100) and I need hundreds of realizations...

Rgds,
Daniel

R-help@stat.math.ethz.ch mailing list