Re: [Rd] Rmpi_0.5-4 and OpenMPI questions

From: Dirk Eddelbuettel <edd_at_debian.org>
Date: Thu, 4 Oct 2007 06:49:47 -0500

On 4 October 2007 at 06:37, Luke Tierney wrote:
| > Yes, my bad. But it also hangs with argument count=3 (which I had tried, but
| > my mail was wrong.)
|
| Any chance the snow workers are picking up another version of Rmpi, eg
| a LAM one? Might happen if you have R_SNOW_LIB set and a Rmpi
| installed there. Otherwise starting with outfile=something may help.
| Let me know what you find out -- I'd like to make the snow
| configuration process more bullet-proof.

I generally don;t have any environment variables, so not sure. I'll try to see what I can find.

| > | count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
| > | count=0 since there is only one member (master). I do not know why snow
| > | did not use count=mpi.universe.size()-1 to find total nodes available.
| >
| > How would it know total nodes ? See below re hostfile.
| >
| > | Anyway after using
| > | cl=makeMPIcluster(count=3),
| > | I was able to run parApply function.
| > |
| > | I tried
| > | R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Also
| > | mpirun -host hostfile -np 1 R --no-save
| > | library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Hao
| > |
| > | PS: hostfile contains all nodes info so in R mpi.universe.size() returns
| > | right number and will spawn to remote nodes.
| >
| > So we depend on a correct hostfile ? As I understand the Open MPI this is
| > deprecated:
| >
| > # This is the default hostfile for Open MPI. Notice that it does not
| > # contain any hosts (not even localhost). This file should only
| > # contain hosts if a system administrator wants users to always have
| > # the same set of default hosts, and is not using a batch scheduler
| > # (such as SLURM, PBS, etc.).
| >
| > I am _very_ interested in running Open MPI and Rmpi under slurm (which we
| > added to Debian as source package slurm-llnl) so it would be nice if this
| > could rewritten to not require a hostfile as this seems to be how upstream is
| > going.
|
| To work better with batch scheduling environments where spawning might
| be techncally or politically problematic I have been trying to improve
| the RMPISNOW script that can be used with LAM as
|
| mpirun -np 3 RMPISNOW
|
| and then either
|
| cl <- makeCluster() # no argument
|
| or
|
| cl <- makeCluster(2) # mpi rank - 1 (or less I believe)
|
| (the default type for makeCluster becomes MPI in this case). This
| seems to work reasonably well in LAM and I think I can get it to work
| similarly in OpenMPI -- will try in the next day or so. Both LAM and
| OpenMPI provide environment variables so shell scripts can determine
| the mpirank, which is useful for getting --slave and output redirect
| to the workers. I haven't figured out anything analogous for
| MPIC/MPICH2 yet.

Yes, out of a run I also realized that I can't just ask Rmpi to work without a hostfile -- the info must come from somewhere.

That said, it still fails with a minimal slurm example using the srun. Ie

edd_at_ron:~> cat /tmp/rmpi.r
#!/usr/bin/env r
library(Rmpi)
library(snow)
cl <- makeMPIcluster(count=1)
print("Hello\n")

does not make it through makeMPIcluster either and just hangs if I do:

edd_at_ron:~> srun -N 1 /tmp/rmpi.r                                 

Dirk

-- 
Three out of two people have difficulties with fractions.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 04 Oct 2007 - 11:57:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 25 Oct 2007 - 11:37:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.