Re: [R] embarrassingly parallel problem - simple loop solution

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Thu, 10 Jul 2008 18:51:12 -0700

Hi Chris --

"Chris Gaiteri" <gaiteri_at_gmail.com> writes:

> I have an "embarrassingly parallel" routine that I need to run 24000^2/2
> times (based on some microarray data). All I really need to do is
> parallelize a nested for-loop. But I haven't found a clear list of what
> packages/commands I'd need to do this. I've got a dual quad core xeon

Any of snow / Rmpi / nws / rpvm (the former has system requirements, the latter three additional software requirements) provide the basic embarrassingly parallel functionality via variants of lapply, e.g., mpi.parLapply.

Vectorized ATLAS (search for ATLAS in the R Installation and Administration Guide) and the experimental package pnmath (see a thread (oops, pun) starting in June with subject Parallel R, for instance) provide parallelism at a finer grain, i.e., the level of linear algebra (ATLAS) or R's math library (pnmath).

> system running RHEL5, so if I could use hyperthreading to increase the
> number of (virtual) nodes that would be great too.

The snow-like solutions allow you to launch as many instances of R as you like (e.g., one per CPU); each operates quasi-independently. Each instance of R uses it's own memory, and for big memory problems this might limit the number of instances per machine.

ATLAS / pnmath make much better use of resources and work without code modification. But these solutions only provide benefit when the calculations are appropriately numerical; many calculations are not formulated in a way that would take advantage of this.

A recent post from Prof. Ripley also mentions the benefits that come from building R with compiler flags tuned to your chip, but I'm not able to locate the thread at the moment.

If you're coming at this from scratch, on a Linux-based system, then snow is probably the easiest to get going, using 'socket'-based clusters. I use Rmpi and, to a lesser extent, pnmath. Both at least in part because I'm interested in the C-level implementations (MPI and openMP, respectively).

Martin

> Appreciate the help.
>
> Chris
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 11 Jul 2008 - 01:59:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jul 2008 - 02:31:59 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive