Re: [Rd] Rmpi on CentOS (64bit)

From: Patrick Connolly <p_connolly_at_slingshot.co.nz>
Date: Fri, 05 Mar 2010 21:28:08 +1300

On Wed, 03-Mar-2010 at 01:57PM -0600, Dirk Eddelbuettel wrote:

[...]

|> You could try to suppress the probe for IB which we did (in the older 1.2.*
|> series of OpenMPI) via
|>
|> # Disable the use of InfiniBand
|> # btl = ^openib
|> btl = ^openib
|>
|> in /etc/openmpi/openmpi-mca-params.conf

CentOS has it in a completely different place. I tried that suggestion, but to no avail.

I looked into "... the availability of the interfaces in the dat.conf file" which the error message mentioned.

$ locate dat.conf

/etc/ofed/dat.conf
/etc/ofed/compat-dapl/dat.conf
/usr/share/man/man5/dat.conf.5.gz

(No such files appear in Fedora 11 and there seems to be no ill-effects).

Are we to assume that it's the second of those that is related to the message?

$ cat /etc/ofed/compat-dapl/dat.conf

OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""


$ cat /etc/ofed/dat.conf

ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1"  ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
$

Does that give anyone any clues as to what could be going on the message (which went like this)?

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version. CMA: unable to open /dev/infiniband/rdma_cm



WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:]. This may be a real error or it may be an invalid entry in the uDAPL Registry which is contained in the dat.conf file. Contact your local System Administrator to confirm the availability of the interfaces in the dat.conf file.

Could there be a modification to Dirk's suggestion that might deal with it? (I'm making a last-ditch attempt to avoid using Fedora.)

It's hard to find anything much about CentOS and MPI -- or at least what people did to get it working. I found tales of people having difficulties with Fedora that I didn't have. I'm not much wiser than I began.

best

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 05 Mar 2010 - 08:31:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 05 Mar 2010 - 11:50:57 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive