Re: [R] parallel computing

From: McGehee, Robert <>
Date: Fri 26 May 2006 - 03:57:19 EST

As much of my processor time is often spent doing basic linear algebra operations (matrix inversion, quadratic programming, etc), I recently recompiled R using a BLAS implementation (ATLAS) tuned for parallel processing. The speed improvement for linear algebra operations was significant on multi-processors.

For example, using:
system.time(x <- replicate(10, matrix(rnorm(N^2), N, N) %*% matrix(rnorm(N^2), N, N)))

I benchmarked speed improvements of 10-20% where N is small (10-100) and speed improvements of up to 6x (e.g. 8 seconds vs 48 seconds) when N is large (1000+).

So for users with lots of linear algebra calculations interested in parallel processing, I'd recommend always starting with (re-)compiling a customized BLAS, if they have not done so already. ATLAS and GOTO are the two most common BLAS implementations that I know of.

As far as true parallel processing, I have not yet tried the before-mentioned R packages, but I did code up an internal package for parallel processing very large simulations in which a simple script is re-run on multiple data sets. In this example I stored each data set in a different numbered directory. The R script would go through each directory, in order, looking for a flag.txt file. If such a file does not exist, the processor puts a flag.txt in that directory, indicating that that directory is in use, and starts processing the data. This allows multiple processors/computers to work on very large simulations in parallel without duplicating work. At one point I was able to muster up 15-20 CPUs from spare Windows and Linux boxes to reduce the simulation time down from days to hours. Such a system would be also be easy to re-create without setting up MPI/PVM if your simulation / project can be divided up in a similar way.


-----Original Message-----
[] On Behalf Of Martin Morgan Sent: Thursday, May 25, 2006 1:17 PM
Cc: r-help
Subject: Re: [R] parallel computing

Hi Moreno --

snow provides an easy interface to simple parallel types of calculations (e.g., lapply in parallel). I quickly wanted to have more direct control over how parallel computations were calculated, and have been using Rmpi. Though in principle snow and Rmpi are 'easy' to use, I found that they actually require a certain amount of understanding about R objects and evaluation, and the underlying communication library (MPI, or PVM).

Hope that helps,


"" <> writes:

> Dear R users,
> I have access to a Sun cluster with multiple processors , a lot of
> RAM and with RedHat installed. I want to take advantage of its
> power for a R routine very time consuming.
> Whick package do I have to use? I know there are snow,snowFT and
> others package.Which is the best for my purpose? Do someone have
> experiences with this?
> Thanck in advance.
> Moreno
> ______________________________________________
> mailing list
> PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Fri May 26 04:04:28 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 26 May 2006 - 06:10:22 EST.

Mailing list information is available at Please read the posting guide before posting to the list.