[Pkg-openmpi-maintainers] Bug#584699: Bug#584699: programs freeze on first MPI op. when run on multihomed IPv6 hosts

Ivan Shmakov ivan at main.uusia.org
Mon Dec 20 18:27:48 UTC 2010


>>>>> Manuel Prinz <manuel at debian.org> writes:

 > thanks for the report! I also took this upstream, but unfortunately
 > neither upstream nor I can reproduce the bug since we do not have
 > multi-homed IPv6 hosts available for testing.

	“Fortunately,” it appears that you don't need one, as the
	problem apparently arises on multi-IPv4-homed hosts as well.

	Trying to work-around the problem, I've tried both the

    --mca oob_tcp_disable_family 6 \
    --mca btl_tcp_disable_family 6 \

	options' combination, and building the package without the IPv6
	support:

--- openmpi-1.4.2/debian/rules
+++ openmpi-1.4.2/debian/rules
@@ -57,6 +57,7 @@
 			--includedir=\$${prefix}/lib/openmpi/include	\
 			--with-devel-headers \
 			--enable-heterogeneous \
+			--disable-ipv6 \
 			$(TORQUE)
 
 # Thread support disabled because it's broken, see bug #435581

	To my surprise, it didn't help!

	Then, however, I observed that the system is IPv4-multihomed
	just as well:

$ ip -4 
…
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    inet 192.168.57.XX/24 scope global eth0
    inet 192.168.57.ZZ/24 scope global eth0
…
$ 

	As soon as I have removed one of the addresses (with
	# ip addr del), the problem was gone.  (As long as IPv6 is
	turned off, — I cannot drop the extra IPv6 addresses on that
	host without running into issues.)

	To reproduce the problem, one can try, e. g. (assuming A.B.C.D
	is an unused address in the network, MASK is the netmask, and
	ethN is the network interface):

root# ip addr add A.B.C.D/MASK dev ethN 
root# 

$ mkdir -- test 
$ cd test/ 
$ cp -- /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt 
$ rm -f -- hpccoutf.txt 
$ mpirun.openmpi \
      --mca btl_base_verbose 30 \
      --mca oob_tcp_debug 1 \
      --mca oob_tcp_disable_family 6 \
      --mca btl_tcp_disable_family 6 \
      hpcc \
      < /dev/null 

	While normally this would create ‘hpccoutf.txt’ almost
	immediately, the problem being discussed will make ‘hpcc’ stuck
	before it'll try to open (create) the file.

	Removing the extra IP addresses should eliminate the problem.

[…]

-- 
Long Happy Life.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-openmpi-maintainers/attachments/20101221/8f34c167/attachment-0001.pgp>


More information about the Pkg-openmpi-maintainers mailing list