[pkg-ntp-maintainers] Bug#711548: Bug#711548: Fragile handling of pidfile by /etc/init.d/ntp

Sergio Gelato sergio.gelato at astro.su.se
Sat Jun 8 21:13:22 UTC 2013


On Sat, 8 Jun 2013 19:51:12 +0200, Kurt Roeckx <kurt at roeckx.be> wrote:
> On Sat, Jun 08, 2013 at 02:45:32PM +0200, Sergio Gelato wrote:
>> On Fri, 7 Jun 2013 22:11:30 +0200, Kurt Roeckx <kurt at roeckx.be> wrote:
>> > But you started a new one, which wrote a PID file, and then it
>> > died because it detected that an other ntpd was still running,
>> > and you really [only] want 1 running.  It probably shouldn't have
>> > written the pid file in that case.
>> 
>> I now have an instance of the problem occurring naturally on a squeeze
>> system (so the trigger mechanism isn't Ubuntu-only, one can't blame it
on
>> upstart in this case), and I can confirm that it is associated with
>> attempts by the system to start two ntpd processes concurrently.
>> Arranging
>> for the instance that loses the race not to have its PID written to the
>> file should be very helpful, I think.
>> 
>> Here are some relevant logs about the incident, lightly sanitised:
>> 
>> Jun  7 08:17:18 <HOST> dhclient: DHCPACK from <SERVERIP>
>> Jun  7 08:17:18 <HOST> ntpd[1576]: ntpd exiting on signal 15
>> Jun  7 08:17:20 <HOST> ntpd[1904]: ntpd 4.2.6p2 at 1.2194-o Sun Oct 17
>> 13:35:13 UTC 2010 (1)
>> Jun  7 08:17:20 <HOST> ntpd[1905]: ntpd 4.2.6p2 at 1.2194-o Sun Oct 17
>> 13:35:13 UTC 2010 (1)
> 
> So you're starting it twice at the same time?  Of course there is
> no PID file yet at the time the 2nd gets started.

A race, as I said. And the problem isn't so much that multiple instances
get started at the same time (only one of them will survive, at least for
typical configurations) but that the init script can't always find the
surviving one afterwards.
 
> Looking at the init script, "status" doesn't use the pid file
> currently.

I beg to differ. It calls status_of_proc, which is defined in
/lib/lsb/init-functions. status_of_proc in turn calls pidofproc, which has
    if [ ! "$specified" ]; then
        pidfile="/var/run/$base.pid"
    fi
and only falls back on /bin/pidof if the pidfile doesn't exist. This
matches what I've seen in testing (different behaviour in the case of a
missing pidfile vs. an existing one with incorrect contents). Verification
with strace is left as an exercise for the non-believer.

>  So it's just going to look at the processes.

No, it's going to "kill -0" the pid named in the pidfile, and return 0 if
that succeeds, 1 if there is no such process.

>  So
> I don't see how status is going to react differently that puppet.
> Note also how it did properly say it's running in your example,
> even when the PID file is wrong.

Only if there happens to be a running process with that pid. The
implementation of pidofproc in /lib/lsb/init-functions doesn't check that
the pid is that of an ntpd instance. The more common case, illustrated in
the second half of my example, is for "status" to return 1 ("program is
dead and /var/run pid file exists") if the pidfile is bogus.



More information about the pkg-ntp-maintainers mailing list