[Pkg-openmpi-maintainers] Bug#572229: openmpi-checkpoint should depend on blcr-util
Fernando Tarlá Cardoso Lemos
fernandotcl at gmail.com
Tue Mar 2 14:37:15 UTC 2010
Package: openmpi-checkpoint
Version: 1.4.1-1
Severity: important
openmpi-checkpoint does not currently depend on blcr-util. However, ompi-restart will segfault unless blcr-util (upstream bug maybe, I reported to the OpenMPI users mailing list, still got no reply).
Although you can take checkpoints without blcr-util installed, restoring a checkpoint with ompi-restart fails.
Due to bug #572021 that I also reported, you'll need to compile OpenMPI from sources to verify this bug (or fix #572021 first):
1) Install openmpi-bin, openmpi-checkpoint (or compile OpenMPI from source, if #572021 is not fixed yet), and make sure you don't have brcl-util installed.
2) Compile a simple MPI app (the typical "ring" app will do it).
3) Run it like this, for example:
modprobe blcr
mpirun -np 4 -am ft-enable-cr ./ring
4) Take a checkpoint, for example:
ompi-checkpoint --term <PID OF THE MPIRUN PROCESS>
5) Try to restore the checkpoint:
ompi-restart ompi-global-snapshot-<PID>.ckpt
The expected result would be having the ring app complete. Instead ompi-restart (or one of its children, I'm not sure) segfaults.
Now install the package brcl-util and repeat this procedure. You'll see that ompi-restart does not segfault. This indicates that openmpi-checkpoint should depend on blcr-util.
Proposed fix: Add blcr-util to Depends.
Note that brcl-util provides binaries like cr_restore, cr_checkpoint, etc. One would think ompi-checkpoint/ompi-restart uses only libcr directly, but that doesn't seem to be the case for ompi-restart.
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-2-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages openmpi-checkpoint depends on:
ii libc6 2.10.2-6 Embedded GNU C Library: Shared lib
ii libcr0 0.8.2-9 Libraries to Checkpoint and Restar
ii libopenmpi1.3 1.4.1-1 high performance message passing l
ii openmpi-bin 1.4.1-1 high performance message passing l
openmpi-checkpoint recommends no packages.
openmpi-checkpoint suggests no packages.
-- no debconf information
More information about the Pkg-openmpi-maintainers
mailing list