From: Huntsinger, Reid <reid_huntsinger_at_merck.com>

Date: Tue 10 May 2005 - 03:22:31 EST

> b <- rep(0,M)

*> for(m in 1:30){
*

> summary(a)

> summary(b)

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 10 03:26:50 2005

Date: Tue 10 May 2005 - 03:22:31 EST

I suggest

- Transpose "data" once at the beginning.
- Replace "apply" with "colSums" to find cols with sum = d. Since you have logical values, the sum count the number of TRUES and you want them all TRUE, it looks to me.

With further work you could vectorize this, but loops in R are actually pretty good once you can streamline the code inside.

I get

> system.time(for(i in 1:n) Chat[i] <-

sum(apply(t(data)<=data[i,],2,prod))/(n+1))
[1] 0.62 0.01 0.73 NA NA

while with

> tdata <- t(data)

I get much improved

> system.time(for(i in 1:n) Chat[i] <- sum(colSums(tdata <= tdata[,i]) ==

d)/(n+1))

[1] 0.04 0.00 0.04 NA NA

Reid Huntsinger

> n <- dim(data)[1]

*> d <- dim(data)[2]
**> Chat <- rep(0,n)
**> for(i in 1:n)
*

+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)

However, I have a feeling this can be done more effectively than using a for-loop. I have also tried the following:

*> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
*

> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))

> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))

but there is no improvement. I ran the following timing test:

> data <- matrix(runif(300),100,3)

*> n = dim(data)[1]
**> d = dim(data)[2]
**> Chat = vector("numeric",n)
**> M <- 30
**> a <- rep(0,M)
**> for(m in 1:M){
*

+ a[m] <- system.time({ + tmp1 <- lapply(1:n,function(i) t(data)<=data[i,]) + tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod)) + Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}

> b <- rep(0,M)

+ b[m] <- system.time( + for (i in 1:n) + Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}

> summary(a)

> summary(b)

and the output was:

> summary(a)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8500 0.8700 0.8900 0.9013 0.9300 0.9800

> summary(b)

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8400 0.8600 0.8800 0.8883 0.9075 0.9900

Is there any way I can code this more efficiently in R or will I have to turn to C? The data sets, on which I am actually going to run this code, will be of sizes up to (5000x100) and I need hundreds of realizations...

Thank you for your time.

Rgds,

Daniel

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 10 03:26:50 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:40 EST
*