Re: [R] Parallel R

From: Luke Tierney <luke_at_stat.uiowa.edu>
Date: Mon, 30 Jun 2008 09:48:25 -0500 (CDT)

On Mon, 30 Jun 2008, Juan Pablo Romero Méndez wrote:

> Thanks!
>
> It turned out that Rmpi was a good option for this problem after all.

To help with improving snow I'd be interested to hear more bout why Rmpi works for you but snow did not.

>
> Nevetheless, pnmath seems very promising, although it doesn't load in my system:
>
>
>> library(pnmath)
> Error in dyn.load(file, DLLpath = DLLpath, ...) :
> unable to load shared library
> '/home/jpablo/extra/R-271/lib/R/library/pnmath/libs/pnmath.so':
> libgomp.so.1: shared object cannot be dlopen()ed
> Error: package/namespace load failed for 'pnmath'
>
>
> I find it odd, because libgomp.so.1 is in /usr/lib, so R should find it.

Could you tell us the OS and gcc version you are using? We are starting to look at folding this into base R and this may help with figuring out configuration issues.

The error probably means what it says: libgomp.so is found but can't be used with dlopen. Early versions of libgomb included an "optimization" that meant libgomp.so could only be used if it was linked at compile time and so got loaded by the shared library manager at program startup. It could not be loaded at runtime with dlopen. This has resulted in a number of complaints because it makes libgomp unusable in embedded settings. Many Linux distributions seem to have patched this, including current Fedora and RHEL, but I suspect it will continue to arise from time to time. If you build R from source you can work around this by linking R with -lgomp (doesn't help with embedded uses of R though). Here are some relevant threads on this that brian Ripley tracked down a while back:

http://gcc.gnu.org/ml/gcc-help/2007-09/msg00050.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28482

Another option for trying out the current parallel nmath code is pnmath0 available from the same place as pnmath. This uses raw pthreads rather than Open MP.

Best,

luke

>
>
> Juan Pablo
>
>
> On Sun, Jun 29, 2008 at 1:36 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote:
>> "Juan Pablo Romero Méndez" <jpablo.romero_at_gmail.com> writes:
>>
>>> Hello,
>>>
>>> The problem I'm working now requires to operate on big matrices.
>>>
>>> I've noticed that there are some packages that allows to run some
>>> commands in parallel. I've tried snow and NetWorkSpaces, without much
>>> success (they are far more slower that the normal functions)
>>
>> Do you mean like this?
>>
>>> library(Rmpi)
>>> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
>>> m <- matrix(0, 10000, 1000)
>>> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)
>> user system elapsed
>> 0.644 0.148 1.017
>>> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)
>> user system elapsed
>> 5.188 2.844 10.693
>>
>> ? (This is with Rmpi, a third alternative you did not mention;
>> 'elapsed' time seems to be relevant here.)
>>
>> The basic problem is that the overhead of dividing the matrix up and
>> communicating between processes outweighs the already-efficient
>> computation being performed.
>>
>> One solution is to organize your code into 'coarse' grains, so the FUN
>> in apply does (considerably) more work.
>>
>> A second approach is to develop a better algorithm / use an
>> appropriate R paradigm, e.g.,
>>
>>> system.time(x3 <- colSums(m), gcFirst=TRUE)
>> user system elapsed
>> 0.060 0.000 0.088
>>
>> (or even faster, x4 <- rep(0, ncol(m)) ;)
>>
>> A third approach, if your calculations make heavy use of linear
>> algebra, is to build R with a vectorized BLAS library; see the R
>> Installation and Administration guide.
>>
>> A fourth possibility is to use Tierney's 'pnmath' library mentioned in
>> this thread
>>
>> https://stat.ethz.ch/pipermail/r-help/2007-December/148756.html
>>
>> The README file needs to be consulted for the not-exactly-trivial (on
>> my system) task of installing the package. Specific functions are
>> parallelized, provided the length of the calculation makes it seem
>> worth-while.
>>
>>> system.time(exp(m), gcFirst=TRUE)
>> user system elapsed
>> 0.108 0.000 0.106
>>> library(pnmath)
>>> system.time(exp(m), gcFirst=TRUE)
>> user system elapsed
>> 0.096 0.004 0.052
>>
>> (elapsed time about 2x faster). Both BLAS and pnmath make much better
>> use of resources, since they do not require multiple R instances.
>>
>> None of these approaches would make a colSums faster -- the work is
>> just too small for the overhead.
>>
>> Martin
>>
>>> My problem is very simple, it doesn't require any communication
>>> between parallel tasks; only that it divides simetricaly the task
>>> between the available cores. Also, I don't want to run the code in a
>>> cluster, just my multicore machine (4 cores).
>>>
>>> What solution would you propose, given your experience?
>>>
>>> Regards,
>>>
>>> Juan Pablo
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke_at_stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Mon 30 Jun 2008 - 14:52:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 Jun 2008 - 15:01:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive