Re: [R] Parallel R

From: Martin Morgan <>
Date: Sat, 28 Jun 2008 23:36:08 -0700

"Juan Pablo Romero Méndez" <> writes:

> Hello,
> The problem I'm working now requires to operate on big matrices.
> I've noticed that there are some packages that allows to run some
> commands in parallel. I've tried snow and NetWorkSpaces, without much
> success (they are far more slower that the normal functions)

Do you mean like this?

> library(Rmpi)
> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
> m <- matrix(0, 10000, 1000)
> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)

   user system elapsed
  0.644 0.148 1.017
> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)

   user system elapsed
  5.188 2.844 10.693           

? (This is with Rmpi, a third alternative you did not mention; 'elapsed' time seems to be relevant here.)

The basic problem is that the overhead of dividing the matrix up and communicating between processes outweighs the already-efficient computation being performed.

One solution is to organize your code into 'coarse' grains, so the FUN in apply does (considerably) more work.

A second approach is to develop a better algorithm / use an appropriate R paradigm, e.g.,

> system.time(x3 <- colSums(m), gcFirst=TRUE)

   user system elapsed
  0.060 0.000 0.088      

(or even faster, x4 <- rep(0, ncol(m)) ;)

A third approach, if your calculations make heavy use of linear algebra, is to build R with a vectorized BLAS library; see the R Installation and Administration guide.

A fourth possibility is to use Tierney's 'pnmath' library mentioned in this thread

The README file needs to be consulted for the not-exactly-trivial (on my system) task of installing the package. Specific functions are parallelized, provided the length of the calculation makes it seem worth-while.

> system.time(exp(m), gcFirst=TRUE)

   user system elapsed
  0.108 0.000 0.106
> library(pnmath)
> system.time(exp(m), gcFirst=TRUE)

   user system elapsed
  0.096 0.004 0.052

(elapsed time about 2x faster). Both BLAS and pnmath make much better use of resources, since they do not require multiple R instances.

None of these approaches would make a colSums faster -- the work is just too small for the overhead.


> My problem is very simple, it doesn't require any communication
> between parallel tasks; only that it divides simetricaly the task
> between the available cores. Also, I don't want to run the code in a
> cluster, just my multicore machine (4 cores).
> What solution would you propose, given your experience?
> Regards,
> Juan Pablo
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 29 Jun 2008 - 06:40:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Jul 2008 - 06:31:53 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive