Re: [R] Speed up code with for() loop

From: Jeremy Hetzel <jthetzel_at_gmail.com>
Date: Thu, 28 Apr 2011 14:27:28 -0700 (PDT)


Hans,

You could parallelize it with the multicore package. The only other thing I can think of is to use calls to .Internal(). But be vigilant, as this might not be good advice. ?.Internal warns that only true R wizards should even consider using the function. First, an example with .Internal() calls, later mutlicore. For me, the following reduces elapsed time by about 9% on Windows 7 and by about 20% on today's new Ubuntu Natty.

## Set number of replicates

n <- 10000

## Your example

set.seed(1)
time.one <- Sys.time()
Error<-rnorm(n, mean=0, sd=0.05)
estimate<-(log(1.1)-Error)
DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) D<-n
Delta_ln<-rep(0,D)
for(i in 1:D)
Delta_ln[i]<-(log(mean(sample(DCF_korrigiert,D,replace=TRUE))/(1/0.10))) time.one <- Sys.time() - time.one

## A few modifications with .Internal()
set.seed(1)
time.two <- Sys.time()
Error <- rnorm(n, mean = 0, sd = 0.05)
estimate <- (log(1.1) - Error)
DCF_korrigiert <- (1 / (exp(1 / (exp(0.5 * (-estimate)^2 / (0.05^2)) * sqrt( 2* pi / (0.05^2)) * (1 - pnorm(0,((-estimate) / (0.05^2)), sqrt(1 / (0.05^2))))))-1))
D <- n
Delta_ln2 <- numeric(length = D)
Delta_ln2 <- vapply(Delta_ln2, function(x) {
log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1)
time.two <- Sys.time() - time.two

## Compare

all.equal(Delta_ln, Delta_ln2)
time.one
time.two
as.numeric(time.two) / as.numeric(time.one)

Then you could parallelize it with multicore's parallel() function:

## Try multicore

require(multicore)
set.seed(1)
time.three <- Sys.time()
Error <- rnorm(n, mean = 0, sd = 0.05)
estimate <- (log(1.1) - Error)
DCF_korrigiert <- (1 / (exp(1 / (exp(0.5 * (-estimate)^2 / (0.05^2)) * sqrt( 2* pi / (0.05^2)) * (1 - pnorm(0,((-estimate) / (0.05^2)), sqrt(1 / (0.05^2))))))-1))
D <- n/2
Delta_ln3 <- numeric(length = D)
Delta_ln3.1 <- parallel(vapply(Delta_ln3, function(x) {
log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1), mc.set.seed = T)
Delta_ln3.2 <- parallel(vapply(Delta_ln3, function(x) {
log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1), mc.set.seed = T)
results <- collect(list(Delta_ln3.1, Delta_ln3.2)) names(results) <- NULL
Delta_ln3 <- do.call("append", results)
time.three <- Sys.time() - time.three

## Compare

# Results won't be equal due to the different way # parallel() handles set.seed() randomization all.equal(Delta_ln, Delta_ln3)

time.one
time.two
time.three

as.numeric(time.three) / as.numeric(time.one)

Combining parallel() with the .Internal calls reduces the elapsed time by about 70% on Ubuntu Natty. Multicore is not available for Windows, or at least not easily available for Windows.

But maybe the true R wizards have better ideas.

Jeremy



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 28 Apr 2011 - 21:30:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Apr 2011 - 21:40:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive