Re: [R] parallel computing

From: McGehee, Robert <Robert.McGehee_at_geodecapital.com>
Date: Fri 26 May 2006 - 03:57:19 EST


Moreno,
As much of my processor time is often spent doing basic linear algebra operations (matrix inversion, quadratic programming, etc), I recently recompiled R using a BLAS implementation (ATLAS) tuned for parallel processing. The speed improvement for linear algebra operations was significant on multi-processors.

For example, using:
system.time(x <- replicate(10, matrix(rnorm(N^2), N, N) %*% matrix(rnorm(N^2), N, N)))

I benchmarked speed improvements of 10-20% where N is small (10-100) and speed improvements of up to 6x (e.g. 8 seconds vs 48 seconds) when N is large (1000+).

So for users with lots of linear algebra calculations interested in parallel processing, I'd recommend always starting with (re-)compiling a customized BLAS, if they have not done so already. ATLAS and GOTO are the two most common BLAS implementations that I know of.

As far as true parallel processing, I have not yet tried the before-mentioned R packages, but I did code up an internal package for parallel processing very large simulations in which a simple script is re-run on multiple data sets. In this example I stored each data set in a different numbered directory. The R script would go through each directory, in order, looking for a flag.txt file. If such a file does not exist, the processor puts a flag.txt in that directory, indicating that that directory is in use, and starts processing the data. This allows multiple processors/computers to work on very large simulations in parallel without duplicating work. At one point I was able to muster up 15-20 CPUs from spare Windows and Linux boxes to reduce the simulation time down from days to hours. Such a system would be also be easy to re-create without setting up MPI/PVM if your simulation / project can be divided up in a similar way.

Cheers,
Robert

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Martin Morgan Sent: Thursday, May 25, 2006 1:17 PM
To: mb7312@libero.it
Cc: r-help
Subject: Re: [R] parallel computing

Hi Moreno --

snow provides an easy interface to simple parallel types of calculations (e.g., lapply in parallel). I quickly wanted to have more direct control over how parallel computations were calculated, and have been using Rmpi. Though in principle snow and Rmpi are 'easy' to use, I found that they actually require a certain amount of understanding about R objects and evaluation, and the underlying communication library (MPI, or PVM).

Hope that helps,

Martin

"mb7312@libero.it" <mb7312@libero.it> writes:

> Dear R users,
>
> I have access to a Sun cluster with multiple processors , a lot of
> RAM and with RedHat installed. I want to take advantage of its
> power for a R routine very time consuming.
>
> Whick package do I have to use? I know there are snow,snowFT and
> others package.Which is the best for my purpose? Do someone have
> experiences with this?
>
> Thanck in advance.
>
> Moreno
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri May 26 04:04:28 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 26 May 2006 - 06:10:22 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.