Re: [R] R and Openmpi

From: Dirk Eddelbuettel <edd_at_debian.org>
Date: Sat, 31 May 2008 13:23:37 -0500

Paul,

On 30 May 2008 at 15:47, Paul Hewson wrote:
| Hello,
|
| We have R working with Rmpi/openmpi, but I'm a little worried. Specifically, (a) the -np flag doesn't seem to override the hostfile (it works fine with fortran hello world) and (b) I appear to have twice as many processes running as I think I should.
|
| Rmpi version 0.5.5
| Openmpi version 1.1

That's old. Open MPI 1.2.* fixed and changed a lot of things. I am happy with 1.2.6, the default on Debian.

| Viglen HPC with (effectively) 9 blades and 8 nodes on each blade.
| myhosts file contains details of the 9 blades, but specifies that there are 4 slots on each blade (to make sure I leave room for other users).
|
| When running mpirun -bynode -np 2 -hostfile myhosts R --slave --vanilla task_pull.R
|
| 1. I get as many R slaves as there slots defined in my myhosts file (there are 36 slots defined, and I get 36 slaves, regardless of the setting of -np, the master goes on the first machine in the myhosts file.
| 2. The .Rout file confirms that I have 1 comm with 1 master and 36 slaves
| 3. When I top each blade it indicates that there are in fact 8 processes running on each blade and
| 4. When I pstree each blade it indicates that there are two orted processes, each with 4 subprocesses.

You never showed us task_pull.R ... And as I readily acknowledge that this can be tricky, why don't you experiment with simple setting?. Consider this token littler [1] invocation (or use Rscript if you prefer / have only that):

  edd_at_ron:~> r -e'library(Rmpi); cat("Hello rank", mpi.comm.rank(0), "size", mpi.comm.size(0), "on", mpi.get.processor.name(), "\n")'   Hello rank 0 size 1 on ron
  edd_at_ron:~>

So without an outer mpirun (or orterun as the Open MPI group now calls it) we get one instance. Makes sense.

Now with two hosts defined on the fly, and two instances each:

  edd_at_ron:~> orterun -n 4 -H ron,joe r -e'library(Rmpi); cat("Hello rank", mpi.comm.rank(0), "size", mpi.comm.size(0), "on", mpi.get.processor.name(), "\n")'   Hello rank 0 size 4 on ron
  Hello rank 2 size 4 on ron
  Hello rank 3 size 4 on joe
  Hello rank 1 size 4 on joe
  edd_at_ron:~>

Adding '-bynode' and using '-np 4' instead of '-n 4' does not change anything.  

| >From the point of view of getting a job done this ***seems*** OK (it's running very quickly), but it doesn't seem quite right - given I'm sharing the machine with other users and so on. Is there something I've missed in the useage of mpirun with R/Rmpi.

I cannot quite determine from what you said here what your objective is. What exactly are you trying to do that you are not getting done? Using fewer instances? Maybe that is in fact an Open MPI 1.2.* versus 1.1.* issue.

One thing to note is that if you wrap all this in the excellent snow packache by Tierney et al, then Open MPI's '-n' can always be one as determine from _within_ how many nodes you want:

  edd_at_ron:~> orterun -bynode -np 1 -H ron,joe r -e'library(snow); cl <- makeCluster(4, "MPI"); res <- clusterCall(cl, function() Sys.info()["nodename"]); print(do.call(rbind, res))'   Loading required package: utils
  Loading required package: Rmpi

          4 slaves are spawned successfully. 0 failed.
       nodename
  [1,] "joe"
  [2,] "ron"

  [3,] "joe"
  [4,] "ron"
  edd_at_ron:~>

Note the outer '-n 1' and the inner makeCluster(4, "MPI") to give you 4 slaves. If you use a larger '-n $N' you will get $N instances each starting as many nodes as makeCluster asks for.

Hope this helps, Dirk

[1] Littler can be had via Debian / Ubuntu or from http://dirk.eddelbuettel.com/code/littler.html

-- 
Three out of two people have difficulties with fractions.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 02 Jun 2008 - 04:13:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 02 Jun 2008 - 11:30:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive