[R] Viable Approach to Parallel R?

From: Lewis, Daniel (IS Consultant) <DLewis_at_consultantemail.com>
Date: Mon, 11 Feb 2008 13:09:00 -0500


We are researching approaches to parallel R with the end goal of running R in a distributed manner on a Linux cluster. We expect of course to do some work decomposing our problems to be task-parallel or data-parallel, but wouldn't mind getting an initial boost working with "embarrassingly parallel" code sections and one of the approaches below.

Incidentally our environment includes R 2.6.1, RHEL 5.1, Solaris 10, SGE
(Sun Grid Engine) and OpenMPI 1.2.4 (SunHPC 7.1)).

In researching previous work, the most promising approaches seem to be:

  1. Snow (with Rmpi or Rpvm) (as described in http://www.r-project.org/useR-2006/Slides/Harrington+Salibian-Barrera.pd f from the 2006 R User Conference)

It is my understanding that this approach is viable, and works with OpenMPI 1.2.4. Is anyone using this method with good results?

B. taskpR, RScaLAPACK, pMatrix

I read a paper
http://sdm.lbl.gov/sdmcenter/projects/SDM.center.parallel.r.2-pager.4.do c coming out of the ORNL, describing what they call "parallel R", which included taskpr, RScaLAPACK, pMatrix. I notice that taskpR is no longer available in "contrib", nor is pMatrix.

An old link indicates the packages are available at http://www.ASPECT-SDM.org/Parallel-R but that site displays a notice that the server is migrating. Has this work been discontinued? Anyone using this? I see RScaLAPACK is still available, from reading the above it seems that was bundled with taskpR. Does it function without the other components? (Guess I'll try it and find out :)

C. Sleigh & "NetworkSpaces"

I see that SCAI (Scientific Computing Associates) offers a parallel R package based on something they call NetworkSpaces and "Sleigh"
(inspired by Snow). They sell services around the product but it is open
source. They have an enhanced version that they sell & support. http://www.lindaspaces.com/hp/BenchmarksWithCharts.pdf. Has anyone investigated this approach or it's open source components?

TIA for any information, direction, suggestions, and if I've missed any other approaches please advise.

Dan Lewis

        [[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 11 Feb 2008 - 18:14:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Feb 2008 - 19:30:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive