[R] doSMP package works better than perfect, at least sometimes.

From: <Seeliger.Curt_at_epamail.epa.gov>
Date: Tue, 19 Apr 2011 12:52:19 -0700


Some might have noticed that REvolution Computing released the doSMP package to the general public about a month and a half ago, which allows multiple cores to be accessed for parallel computation in R. Some of our physical habitat calculations were taking an extraordinary amount of time to complete and required over-weekend runs, which prompted our interest in this package. What follows is the results of those tests.

In brief, the toy test resulted in speed increase of the calculations to a plausible degree depending on the number of workers (cores? threads?) used. Timing of our real world application gave results that were better than perfect. In fact, they were staggeringly better than perfect. Maybe someone can suggest why. Also in brief, I'd like to quickly thank REvolution for providing us with this really great package.

These metrics are based on a for-loop construct that is difficult to vectorize, so a toy test was developed (code given below) which loops through simple sqrt() calculations in a way one might find in Burns' third circle of Hell. Short loops were used to cause thrashing during processor assigning, and longer ones used to simulate 'harder' or more time-consuming tasks. The processing time of each set of tasks was measured for basic unvectorized for() looping, foreach() %do% looping, and foreach() %dopar% looping, using a 4 core Xenon PC running XP with 3.2 GB RAM. Using 3 'workers', the increase in speed due to iteration with the foreach() %do% construct showed the expected amount of thrashing for small/easy calculations, with the internal overhead being overcome after roughly 10,000 total calculations. The increase due to use of SMP relative to the single-processor iteration showed it to start being worth while with only 10 groups, regardless of the group size.

Speedup of foreach() %do% construct relative to basic for():

     n g= 1 10 100 1000

    10 0.6000000  0.6250000 0.900000 1.010499
   100 0.7230769  0.9333333 1.180000 1.231752
  1000 0.7968750  1.8987730 3.564801 2.078614
 10000 2.1724356 10.4700474 8.002192       NA

Speedup of foreach() %dopar% construct relative to foreach() %do% construct:

     n g= 1 10 100 1000

    10 0.09803922 1.142857 1.875000 2.164773
   100 0.94202899 1.363636 2.702703 2.689359
  1000 0.81012658 1.429825 2.951413 2.602386
 10000 0.87239919 1.182743 1.548661       NA

Using 7 'workers', the increase in speed due to iteration with the foreach() %do% construct was not as close to the results with three 'workers' as expected, though thrashing was still evident when the number of calculations were small. The increase due to using multiple cores maxed out around 5.5, below the theoretically perfect 7x speedup but not consistently high for all conditions. I'm not sure if this is system noise, or if some other constraint is influencing the results.

Speedup of foreach() %do% construct relative to basic for():

     n g= 1 10 100 1000

    10 0.400000 1.1111111 0.9210526 1.037190
   100 0.650000 0.8831169 1.1215881 1.199677
  1000 0.768116 1.7843360 3.5691298 2.051362
 10000 1.981686 8.8194254 8.2673038       NA

Speedup of foreach() %dopar% construct relative to foreach() %do% construct:

     n g= 1 10 100 1000

    10 0.8333333 1.285714 4.222222 3.751938
   100 0.9523810 1.452830 5.302632 5.516474
  1000 0.9409091 1.284257 3.123677 3.848393
 10000 0.8640463 1.073046 1.609020       NA

The real world test was to time our residual pool calculations for about 1200 channels (80-150 depths recorded in each) on the same machine using 7 'workers'. This had previously taken 32 hours and 2 minutes, judging by the timestamp of the intermediate files created during calculation. With doSMP the calculations took 7 minutes and the results were identical. Nothing in the toy tests would have indicated we'd see these calculations sped up by a factor of 275. Since 275 is much larger than 7, this is due to more than just making unused cores available and I suspect it's due to internal compilation. A quick check of the docs does not support this conjecture. Does anyone have a better explanation?

Thanks for your input,
cur

ps - Thanks to Revolution for releasing this package. They occasionally get kicked for their closed-source addon to R, but it's clear that their releases of packages like doSMP and foreach are important contributions to the community.

###### Toy test code follows:######
# Toy SMP

memory.limit(3000)
require(doSMP)
require(reshape2)
getDoParWorkers()
w<- startWorkers(workerCount=3)
registerDoSMP(w)

timeSMP <- function(g, n)
# g = number of groups to process
# n = size of each group.

{
  for(rep in 1:3) {

      times <- NULL
      dd <- data.frame(k=rep(1:g, n), x=runif(g*n))
      ddSplit <- split(dd, dd$k)
      tt<-system.time({
        dd2 <- foreach(e=names(ddSplit), .combine=rbind) %dopar% {   # SMP
                   elem <- ddSplit[[e]]
                   for (i in 1:nrow(elem)) {
                       elem$y[i] <- sqrt(elem$x[i])
                   }

                   elem
               }

      })
 
      times <- rbind(times, 

as.data.frame(cbind(t(tt),g=g,n=n,method='SMPVectorized')))
      tt<-system.time({
        dd3 <- foreach(e=names(ddSplit), .combine=rbind) %do% {  # Single 
core
                   elem <- ddSplit[[e]]
                   for (i in 1:nrow(elem)) {
                       elem$y[i] <- sqrt(elem$x[i])
                   }

                   elem
               }
      })
      times <- rbind(times, 

as.data.frame(cbind(t(tt),g=g,n=n,method='1CoreVectorized')))
      dd4<-NULL
      tt<-system.time({  # loop through list elements
               for (e in names(ddSplit)) {
                   elem <- ddSplit[[e]]
                   for (i in 1:nrow(elem)) {
                       elem$y[i] <- sqrt(elem$x[i])
                   }

                   dd4 <- rbind(dd4, elem)
               }
      })
      times <- rbind(times, 

as.data.frame(cbind(t(tt),g=g,n=n,method='unvectorized')))

      write.table(times, file='c:/r/dosmpTest.csv', append=TRUE, row.names=FALSE, sep=',')  

  } # end of repetition loop  

}

summarizeTimes <- function(fname)
# Summarize timing results and display them.
{
  # read in results, format columns and make methods more 'variable-name friendly'.
  times <- read.csv(fname, stringsAsFactors=FALSE)   times <- subset(times, user.self != 'user.self', select=-c(user.child,sys.child))

  times$user.self <- as.numeric(times$user.self)
  times$sys.self <- as.numeric(times$sys.self)
  times$elapsed <- as.numeric(times$elapsed)
  times$g <- as.numeric(times$g)
  times$n <- as.numeric(times$n)

  # Summarize
  stats <- merge(aggregate(list(meanElapsed=times$elapsed)

                          ,list(g=times$g, n=times$n, method=times$method)
                          ,mean, na.rm=TRUE
                          )
                ,aggregate(list(meanSelf=times$user.self)
                          ,list(g=times$g, n=times$n, method=times$method)
                          ,mean, na.rm=TRUE
                          )
                ,by=c('g','n','method')
                )


  # transpose to wide
  mm <- melt(stats, id=c('g','n','method'))   tstats <- dcast(mm, g + n ~ variable+method)   tstats$speedup.elapsed1 <- tstats$meanElapsed_unvectorized / tstats$meanElapsed_1CoreVectorized
  tstats$speedup.elapsed3 <- tstats$meanElapsed_1CoreVectorized / tstats$meanElapsed_SMPVectorized

  speedupVectorizing <- dcast(tstats[c('g','n','speedup.elapsed1')], g~n, value_var='speedup.elapsed1')
  speedupSMP <- dcast(tstats[c('g','n','speedup.elapsed3')], g~n, value_var='speedup.elapsed3')

  return(list(vectoring=speedupVectorizing, smp=speedupSMP)) }

timeSMP(10,1)                 # make it thrash as much as possible
timeSMP(100,1)

timeSMP(1000,1)
timeSMP(10000,1)
#timeSMP(100000,1)           # too much memory
#timeSMP(1000000,1)          # too much memory
timeSMP(10,10)
timeSMP(100,10)
timeSMP(1000,10)
timeSMP(10000,10)
timeSMP(10,100)
timeSMP(100,100)
timeSMP(1000,100)
timeSMP(10000,100)
timeSMP(10,1000)
timeSMP(100,1000)
timeSMP(1000,1000)
timeSMP(10000,1000)

timeSMP(10,10000)
timeSMP(100,10000)
# The following take up too much memory, even with a 3GB memory limit.
#timeSMP(1000,5000)
#timeSMP(5000,100)
#timeSMP(5000,1000)
#timeSMP(5000,5000)

summarizeTimes('c:/r/dosmpTest.csv')

-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.curt_at_epa.gov
541/754-4638

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 19 Apr 2011 - 20:02:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Apr 2011 - 20:10:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive