Re: [Rd] Cell or PS3 Port

From: Ed Knutson <ed_at_sixfoursystems.com>
Date: Fri, 02 Nov 2007 11:51:41 -0500

The main core of the Cell (the PPE) uses IBM's version of hyperthreading to expose two logical, main CPU's to the OS, so code that is "simply" multi-threaded should still see an advantage. In addition, IBM provides an SDK which includes workflow management as well as libraries to support common linear algebra and other math functions on the sub-processors (called SPE's). They also provide an interface to a hardware RNG as well as 3 software types (2 psuedo, 1 quasi) that are coded for the SPE.

Each SPE has its own small, local memory store and communicates with main memory using a DMA queue. It seems to be a question of breaking up each task into units that are small enough to offload to an SPE. My initial direction will be to set up a rudimentary workflow manager. As an optimized function is encountered, a sufficient number of SPE threads will be spawned and execution of the main thread will wait for all results. As for the optimized functions, I intend to start with the ones who already have an analogous implementation in the IBM math libraries.

MPI has been employed by some Cell developers to allow multiple SPE's working on sections of the same task to communicate with each other. I like the idea of this approach, since it lays the groundwork to allow multiple Cell (or really any) processors to be clustered.

Luke Tierney wrote:
> I have been experimenting with ways of parallelizing many of the
> functions in the math library. There are two experimental packages
> available in http://www.stat.uiowa.edu/~luke/R/experimental: pnmath,
> based on OpenMP, and pnmath0, based on basic pthreads. I'm not sure
> to what degree the approach there would carry over to GPUs or Cell
> where the additional processors are different from the main processor
> and may not share memory (I forget how that works on Cell).
>
> The first issue is that you need some modifications to the some
> functions to ensure they are thread-safe. For the most part these are
> minor; a few functions would require major changes and I have not
> tackled them for now (Bessel functions, wilcox, signrank I believe).
> RNG functions are also not suitable for parallelization given the
> dependence on the sequential underlying RNG.
>
> It is not too hard to get parallel versions to use all available
> processor cores. The challenge is to make sure that the parallel
> versions don't run slower than the serial versions. They may if the
> amount of data is too small. What is too small for each function
> depends on the OS and the processor/memory architecture; if memory is
> not shared this gets more complicated still. For some very simple
> functions (floor, ceiling, sign) I could not see any reliable benefit
> of parallelization for reasonable data sizes on the systems I was
> using so I left those alone for now.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 02 Nov 2007 - 16:55:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 02 Nov 2007 - 17:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.