Bug#355637: [Pkg-mailman-hackers] Bug#355637: mailman: Stale lock files break administrative web interface

Shannon C. Dealy dealy at deatech.com
Tue Mar 7 19:20:10 UTC 2006


On Tue, 7 Mar 2006, Lionel Elie Mamane wrote:

> tags 355637 +upstream
>
> thank you for your bug report.
>
> On Mon, Mar 06, 2006 at 12:55:44PM -0800, Shannon Dealy wrote:
>
>> Under some circumstances (presumably mailman software or system
>> crashes), list specific stale lock files are left in the directory
>> /var/lib/mailman/locks this can permanently prevent administrative
>> login for that specific list until the lock file(s) are
>> removed. There appears to be no mechanism to cleanup these stale
>> lock files, and restarting mailman or even rebooting the system does
>> not clean things up.  At the very least restarting mailman should
>> cleanup these stale lock files,
>
> What do you mean with "restarting mailman"? The only interpretation I
> can find is restarting the queue daemon (the effect of
> "/etc/init.d/mailman restart"). But there is still the Apache (or

This is what I meant.

> other http server) running mailman CGIs. I don't think that merely
> restarting the mailman queue daemon should summarily remove the lock
> files: Apache is still running, and may be running a Mailman CGI
> genuinely holding that lock for an operation.

I wasn't sure if the CGI scripts manipulate things directly or do their 
work through socket connections to the many daemon processes that always 
seem to be running, however, there is presumably some form of lock 
manager being used, and it would seem that a restart (as you specified 
above), is an appropriate time for the lock manager to be run and assess 
the validity of all of the locks, whether it kills all CGI's in progress
and wipes the locks or merely checks that they are for existing processes 
and not to old (hung CGI processes) and then cleans up accordingly.  The 
main point here is that the worst case senario should be that if something 
is messed up, restarting mailman should clean it up.  The better solution 
is of course active monitoring which fixes the problems as they occur 
rather than requiring manual intervention.

The important thing here is that simple as the problem was to fix once I 
figured out what was going on, recognising, finding, and fixing a problem 
of this sort is completely beyond the capabilities of the overwhelming 
majority of people running this software, though hopefully this bug report 
may help some people sort it out if this doesn't get fixed.  I did 
shutdown mailman and kill all active mailman CGI processes before
deleting everything in the /var/lib/mailman/locks directory.  Due to the 
bug however, one mailman process refused to shutdown and had to be taken 
out with a kill -9.

>> in particular what I assume is the master lock: listname.lock and
>> probably the actual source of my problems.  A better solution would
>> probably include actually checking the lock files periodically to
>> make sure they are still valid.
>
> Yes.
>
> You may be hit by something like
> http://mail.python.org/pipermail/mailman-developers/2006-January/018506.html
>
> Upstream doesn't seem very eager to track down that kind of issues :-(

Looking at this posting seems to imply that there is no central lock 
management code (or that it is incompletely implemented).  Proper design 
of locking would normally imply the lock is automatically released when 
the thread terminates unless it is released earlier or explicitly 
requested to otherwise be held (perhaps for a daemon to clean up later but 
this is usually a bad idea), though even if it were implemented properly, 
there must be some recovery mechanism for power failures at inconvenient 
times and other "hard" crashes of the software.

Unfortunately, I don't really know Python yet or have the time to look 
into this further.

FWIW.

Shannon C. Dealy      |               DeaTech Research Inc.
dealy at deatech.com     |          - Custom Software Development -
                       |    Embedded Systems, Real-time, Device Drivers
Phone: (800) 467-5820 | Networking, Scientific & Engineering Applications
    or: (541) 929-4089 |                  www.deatech.com




More information about the Pkg-mailman-hackers mailing list