From: <Seeliger.Curt_at_epamail.epa.gov>

Date: Tue, 19 Apr 2011 12:52:19 -0700

as.data.frame(cbind(t(tt),g=g,n=n,method='SMPVectorized')))

as.data.frame(cbind(t(tt),g=g,n=n,method='1CoreVectorized')))

as.data.frame(cbind(t(tt),g=g,n=n,method='unvectorized')))

timeSMP(1000,1)

timeSMP(10000,1)

timeSMP(10,10000)

timeSMP(100,10000)

# The following take up too much memory, even with a 3GB memory limit.

Date: Tue, 19 Apr 2011 12:52:19 -0700

Some might have noticed that REvolution Computing released the doSMP
package to the general public about a month and a half ago, which allows
multiple cores to be accessed for parallel computation in R. Some of our
physical habitat calculations were taking an extraordinary amount of time
to complete and required over-weekend runs, which prompted our interest in
this package. What follows is the results of those tests.

In brief, the toy test resulted in speed increase of the calculations to a plausible degree depending on the number of workers (cores? threads?) used. Timing of our real world application gave results that were better than perfect. In fact, they were staggeringly better than perfect. Maybe someone can suggest why. Also in brief, I'd like to quickly thank REvolution for providing us with this really great package.

These metrics are based on a for-loop construct that is difficult to
vectorize, so a toy test was developed (code given below) which loops
through simple sqrt() calculations in a way one might find in Burns' third
circle of Hell. Short loops were used to cause thrashing during
processor assigning, and longer ones used to simulate 'harder' or more
time-consuming tasks. The processing time of each set of tasks was
measured for basic unvectorized for() looping, foreach() %do% looping, and
foreach() %dopar% looping, using a 4 core Xenon PC running XP with 3.2 GB
**RAM.
**
Using 3 'workers', the increase in speed due to iteration with the
foreach() %do% construct showed the expected amount of thrashing for
small/easy calculations, with the internal overhead being overcome after
roughly 10,000 total calculations. The increase due to use of SMP
relative to the single-processor iteration showed it to start being worth
while with only 10 groups, regardless of the group size.

Speedup of foreach() %do% construct relative to basic for():

n g= 1 10 100 1000

10 0.6000000 0.6250000 0.900000 1.010499 100 0.7230769 0.9333333 1.180000 1.231752 1000 0.7968750 1.8987730 3.564801 2.078614 10000 2.1724356 10.4700474 8.002192 NA

Speedup of foreach() %dopar% construct relative to foreach() %do% construct:

n g= 1 10 100 1000

10 0.09803922 1.142857 1.875000 2.164773 100 0.94202899 1.363636 2.702703 2.689359 1000 0.81012658 1.429825 2.951413 2.602386 10000 0.87239919 1.182743 1.548661 NA

Using 7 'workers', the increase in speed due to iteration with the foreach() %do% construct was not as close to the results with three 'workers' as expected, though thrashing was still evident when the number of calculations were small. The increase due to using multiple cores maxed out around 5.5, below the theoretically perfect 7x speedup but not consistently high for all conditions. I'm not sure if this is system noise, or if some other constraint is influencing the results.

Speedup of foreach() %do% construct relative to basic for():

n g= 1 10 100 1000

10 0.400000 1.1111111 0.9210526 1.037190 100 0.650000 0.8831169 1.1215881 1.199677 1000 0.768116 1.7843360 3.5691298 2.051362 10000 1.981686 8.8194254 8.2673038 NA

Speedup of foreach() %dopar% construct relative to foreach() %do% construct:

n g= 1 10 100 1000

10 0.8333333 1.285714 4.222222 3.751938 100 0.9523810 1.452830 5.302632 5.516474 1000 0.9409091 1.284257 3.123677 3.848393 10000 0.8640463 1.073046 1.609020 NA

The real world test was to time our residual pool calculations for about 1200 channels (80-150 depths recorded in each) on the same machine using 7 'workers'. This had previously taken 32 hours and 2 minutes, judging by the timestamp of the intermediate files created during calculation. With doSMP the calculations took 7 minutes and the results were identical. Nothing in the toy tests would have indicated we'd see these calculations sped up by a factor of 275. Since 275 is much larger than 7, this is due to more than just making unused cores available and I suspect it's due to internal compilation. A quick check of the docs does not support this conjecture. Does anyone have a better explanation?

Thanks for your input,

cur

ps - Thanks to Revolution for releasing this package. They occasionally get kicked for their closed-source addon to R, but it's clear that their releases of packages like doSMP and foreach are important contributions to the community.

###### Toy test code follows:######

# Toy SMP

memory.limit(3000)

require(doSMP)

require(reshape2)

getDoParWorkers()

w<- startWorkers(workerCount=3)

registerDoSMP(w)

timeSMP <- function(g, n)

# g = number of groups to process

# n = size of each group.

{

for(rep in 1:3) {

times <- NULL dd <- data.frame(k=rep(1:g, n), x=runif(g*n)) ddSplit <- split(dd, dd$k) tt<-system.time({ dd2 <- foreach(e=names(ddSplit), .combine=rbind) %dopar% { # SMP elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } elem } }) times <- rbind(times,

as.data.frame(cbind(t(tt),g=g,n=n,method='SMPVectorized')))

tt<-system.time({ dd3 <- foreach(e=names(ddSplit), .combine=rbind) %do% { # Single core elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } elem } }) times <- rbind(times,

as.data.frame(cbind(t(tt),g=g,n=n,method='1CoreVectorized')))

dd4<-NULL tt<-system.time({ # loop through list elements for (e in names(ddSplit)) { elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } dd4 <- rbind(dd4, elem) } }) times <- rbind(times,

as.data.frame(cbind(t(tt),g=g,n=n,method='unvectorized')))

write.table(times, file='c:/r/dosmpTest.csv', append=TRUE, row.names=FALSE, sep=',')

} # end of repetition loop

}

summarizeTimes <- function(fname)

# Summarize timing results and display them.

{

# read in results, format columns and make methods more 'variable-name
friendly'.

times <- read.csv(fname, stringsAsFactors=FALSE)
times <- subset(times, user.self != 'user.self',
select=-c(user.child,sys.child))

times$user.self <- as.numeric(times$user.self) times$sys.self <- as.numeric(times$sys.self) times$elapsed <- as.numeric(times$elapsed) times$g <- as.numeric(times$g) times$n <- as.numeric(times$n)

# Summarize

stats <- merge(aggregate(list(meanElapsed=times$elapsed)

,list(g=times$g, n=times$n, method=times$method) ,mean, na.rm=TRUE ) ,aggregate(list(meanSelf=times$user.self) ,list(g=times$g, n=times$n, method=times$method) ,mean, na.rm=TRUE ) ,by=c('g','n','method') )

# transpose to wide

mm <- melt(stats, id=c('g','n','method'))
tstats <- dcast(mm, g + n ~ variable+method)
tstats$speedup.elapsed1 <- tstats$meanElapsed_unvectorized /
tstats$meanElapsed_1CoreVectorized

tstats$speedup.elapsed3 <- tstats$meanElapsed_1CoreVectorized /
tstats$meanElapsed_SMPVectorized

speedupVectorizing <- dcast(tstats[c('g','n','speedup.elapsed1')], g~n,
value_var='speedup.elapsed1')

speedupSMP <- dcast(tstats[c('g','n','speedup.elapsed3')], g~n,
value_var='speedup.elapsed3')

return(list(vectoring=speedupVectorizing, smp=speedupSMP)) }

timeSMP(10,1) # make it thrash as much as possible timeSMP(100,1)

timeSMP(1000,1)

timeSMP(10000,1)

#timeSMP(100000,1) # too much memory #timeSMP(1000000,1) # too much memory timeSMP(10,10) timeSMP(100,10) timeSMP(1000,10) timeSMP(10000,10) timeSMP(10,100) timeSMP(100,100) timeSMP(1000,100) timeSMP(10000,100) timeSMP(10,1000) timeSMP(100,1000) timeSMP(1000,1000) timeSMP(10000,1000)

timeSMP(10,10000)

timeSMP(100,10000)

# The following take up too much memory, even with a 3GB memory limit.

#timeSMP(1000,5000) #timeSMP(5000,100) #timeSMP(5000,1000) #timeSMP(5000,5000)

summarizeTimes('c:/r/dosmpTest.csv')

-- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.curt_at_epa.gov 541/754-4638 [[alternative HTML version deleted]] ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue 19 Apr 2011 - 20:02:17 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 19 Apr 2011 - 20:10:31 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*