[Pkg-openmpi-maintainers] Bug#572229: openmpi-checkpoint should depend on blcr-util

Fernando Tarlá Cardoso Lemos fernandotcl at gmail.com
Tue Mar 2 14:37:15 UTC 2010


Package: openmpi-checkpoint
Version: 1.4.1-1
Severity: important


openmpi-checkpoint does not currently depend on blcr-util. However, ompi-restart will segfault unless blcr-util (upstream bug maybe, I reported to the OpenMPI users mailing list, still got no reply).

Although you can take checkpoints without blcr-util installed, restoring a checkpoint with ompi-restart fails.

Due to bug #572021 that I also reported, you'll need to compile OpenMPI from sources to verify this bug (or fix #572021 first):

1) Install openmpi-bin, openmpi-checkpoint (or compile OpenMPI from source, if #572021 is not fixed yet), and make sure you don't have brcl-util installed.

2) Compile a simple MPI app (the typical "ring" app will do it).

3) Run it like this, for example:

modprobe blcr
mpirun -np 4 -am ft-enable-cr ./ring

4) Take a checkpoint, for example:

ompi-checkpoint --term <PID OF THE MPIRUN PROCESS>

5) Try to restore the checkpoint:

ompi-restart ompi-global-snapshot-<PID>.ckpt

The expected result would be having the ring app complete. Instead ompi-restart (or one of its children, I'm not sure) segfaults.

Now install the package brcl-util and repeat this procedure. You'll see that ompi-restart does not segfault. This indicates that openmpi-checkpoint should depend on blcr-util.

Proposed fix: Add blcr-util to Depends.

Note that brcl-util provides binaries like cr_restore, cr_checkpoint, etc. One would think ompi-checkpoint/ompi-restart uses only libcr directly, but that doesn't seem to be the case for ompi-restart.


-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-2-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages openmpi-checkpoint depends on:
ii  libc6                         2.10.2-6   Embedded GNU C Library: Shared lib
ii  libcr0                        0.8.2-9    Libraries to Checkpoint and Restar
ii  libopenmpi1.3                 1.4.1-1    high performance message passing l
ii  openmpi-bin                   1.4.1-1    high performance message passing l

openmpi-checkpoint recommends no packages.

openmpi-checkpoint suggests no packages.

-- no debconf information






More information about the Pkg-openmpi-maintainers mailing list