Re: [R] Parallel R

From: Juan Pablo Romero Méndez <jpablo.romero_at_gmail.com>
Date: Mon, 30 Jun 2008 00:18:07 -0500

Thanks!

It turned out that Rmpi was a good option for this problem after all.

Nevetheless, pnmath seems very promising, although it doesn't load in my system:

> library(pnmath)

Error in dyn.load(file, DLLpath = DLLpath, ...) :   unable to load shared library
'/home/jpablo/extra/R-271/lib/R/library/pnmath/libs/pnmath.so':   libgomp.so.1: shared object cannot be dlopen()ed Error: package/namespace load failed for 'pnmath'

I find it odd, because libgomp.so.1 is in /usr/lib, so R should find it.

  Juan Pablo

On Sun, Jun 29, 2008 at 1:36 AM, Martin Morgan <mtmorgan_at_fhcrc.org> wrote:
> "Juan Pablo Romero Méndez" <jpablo.romero@gmail.com> writes:
>
>> Hello,
>>
>> The problem I'm working now requires to operate on big matrices.
>>
>> I've noticed that there are some packages that allows to run some
>> commands in parallel. I've tried snow and NetWorkSpaces, without much
>> success (they are far more slower that the normal functions)
>
> Do you mean like this?
>
>> library(Rmpi)
>> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
>> m <- matrix(0, 10000, 1000)
>> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)
> user system elapsed
> 0.644 0.148 1.017
>> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)
> user system elapsed
> 5.188 2.844 10.693
>
> ? (This is with Rmpi, a third alternative you did not mention;
> 'elapsed' time seems to be relevant here.)
>
> The basic problem is that the overhead of dividing the matrix up and
> communicating between processes outweighs the already-efficient
> computation being performed.
>
> One solution is to organize your code into 'coarse' grains, so the FUN
> in apply does (considerably) more work.
>
> A second approach is to develop a better algorithm / use an
> appropriate R paradigm, e.g.,
>
>> system.time(x3 <- colSums(m), gcFirst=TRUE)
> user system elapsed
> 0.060 0.000 0.088
>
> (or even faster, x4 <- rep(0, ncol(m)) ;)
>
> A third approach, if your calculations make heavy use of linear
> algebra, is to build R with a vectorized BLAS library; see the R
> Installation and Administration guide.
>
> A fourth possibility is to use Tierney's 'pnmath' library mentioned in
> this thread
>
> https://stat.ethz.ch/pipermail/r-help/2007-December/148756.html
>
> The README file needs to be consulted for the not-exactly-trivial (on
> my system) task of installing the package. Specific functions are
> parallelized, provided the length of the calculation makes it seem
> worth-while.
>
>> system.time(exp(m), gcFirst=TRUE)
> user system elapsed
> 0.108 0.000 0.106
>> library(pnmath)
>> system.time(exp(m), gcFirst=TRUE)
> user system elapsed
> 0.096 0.004 0.052
>
> (elapsed time about 2x faster). Both BLAS and pnmath make much better
> use of resources, since they do not require multiple R instances.
>
> None of these approaches would make a colSums faster -- the work is
> just too small for the overhead.
>
> Martin
>
>> My problem is very simple, it doesn't require any communication
>> between parallel tasks; only that it divides simetricaly the task
>> between the available cores. Also, I don't want to run the code in a
>> cluster, just my multicore machine (4 cores).
>>
>> What solution would you propose, given your experience?
>>
>> Regards,
>>
>> Juan Pablo
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 30 Jun 2008 - 05:22:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 Jun 2008 - 15:01:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive