[Pkg-openmpi-maintainers] Bug#572229: openmpi-checkpoint should depend on blcr-util

Fernando Lemos fernandotcl at gmail.com
Wed Mar 3 23:01:56 UTC 2010


On Tue, Mar 2, 2010 at 12:17 PM, Alan Woodland <alan.woodland at gmail.com> wrote:
<snip>
>>> I'm pretty sure it at least was the case that restart did just use
>>> libcr directly, without calling any of the utils, hence my not setting
>>> the dependency originally. I wonder if this is a bug or a feature now?
>>>
>>> Alan
>>>
>>
>> Yeah, I find it weird too. Here's my latest post to the OpenMPI users
>> mailing list:
>>
>> http://www.open-mpi.org/community/lists/users/2010/03/12199.php
>>
>> Maybe it does not use cr_* directly but blcr-util provides something
>> else that ompi-restart requires? Either way, I don't think
>> ompi-restart is supposed to segfault...
> blcr-util doesn't provide anything other than /usr/bin/cr_* and
> documentation. I hope it doesn't anyway, (and there's no surprises in:
> http://packages.debian.org/sid/amd64/blcr-util/filelist) it's designed
> such that an application which uses libcr directly doesn't have to
> pull in anything other than libcr.
>
>> If you prefer, I can run my tests with the openmpi-checkpoint after
>> #572021 is fixed and then report back whether or not blcr-util is
>> needed.
>
> I think I'd like to get to the bottom of why it segaults (stacktrace).
> I'm unlikely to be able to commit significant amounts of time to this
> until after 23rd of March though. A full backtrace or possibly even
> just the final output from strace might be quite enlightening though.
>
> Alan
>

I got two replies on the users mailing list today:

http://www.open-mpi.org/community/lists/users/2010/03/12227.php
http://www.open-mpi.org/community/lists/users/2010/03/12230.php

So, yes, segfaulting is a bug, and now it is officially registered in
upstream's bugtracker. Also, ompi-restart does depend on cr_restart.
Quoting Josuha Hursey:

"Open MPI currently calls 'cr_restart' for each process it restarts,
exec'ed from the 'opal-restart' binary (LAM/MPI also used cr_restart
directly, in case anyone is interested). We use the internal library
interface for checkpoint, but not restarting at this time."

The two upstream tickets:

https://svn.open-mpi.org/trac/ompi/ticket/2329
https://svn.open-mpi.org/trac/ompi/ticket/2330

So I belive now that making openmpi-checkpoint depend on blcr-util is
really the right fix, at least for 1.4.1.

Regards,





More information about the Pkg-openmpi-maintainers mailing list