[pkg-ntp-maintainers] Bug#711548: Fragile handling of pidfile by /etc/init.d/ntp

Sergio Gelato Sergio.Gelato at astro.su.se
Fri Jun 7 19:53:38 UTC 2013


Package: ntp
Version: 1:4.2.6.p5+dfsg-2

The current /etc/init.d/ntp cannot recover from a situation in which
/var/run/ntpd.pid exists but does not contain the correct PID for the
running daemon.

How to reproduce:

# pgrep -f ntpd
9219
# ps -f -p 9404
UID        PID  PPID  C STIME TTY          TIME CMD
root      9404  9395  0 20:38 pts/2    00:00:00 /bin/bash
# printf 9404 > /var/run/ntpd.pid
# invoke-rc.d ntp status
NTP server is running.
# echo $?
0
# invoke-rc.d ntp restart
Stopping NTP server: ntpd.
Starting NTP server: ntpd.
# echo $?
0
# pgrep -f ntpd
9219
# cat /var/run/ntpd.pid
9485# ps -f -p 9404
UID        PID  PPID  C STIME TTY          TIME CMD
root      9404  9395  0 20:38 pts/2    00:00:00 /bin/bash
# ps -f -p 9485
UID        PID  PPID  C STIME TTY          TIME CMD
#

The ntpd process was not restarted. The pidfile was overwritten with a PID
that does not correspond to any running process. The logs show that an
ntpd instance was launched with that PID but exited with
"unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING".

If one repeats the experiment now that /var/run/ntpd.pid points to a
nonexistent process the messages and status codes are slightly different:

# invoke-rc.d ntp status
NTP server is not running ... failed!
invoke-rc.d: initscript ntp, action "status" failed.
# echo $?
1
# invoke-rc.d ntp restart
Stopping NTP server: ntpdstart-stop-daemon: warning: failed to kill 9485: No such process
.
Starting NTP server: ntpd.
# echo $?
0
# pgrep -f ntpd
9219
# cat /var/run/ntpd.pid
9611# 

It would seem that once the pidfile gets out of sync with reality the only
ways to recover are:
a) reboot (and hope the problem doesn't recur); or
b) rm /var/run/ntpd.pid; or
c) pgrep -f ntpd > /var/run/ntpd.pid; or
d) pkill ntpd && invoke-rc.d ntp start

This is forcing me to set hasstatus => false for the ntp service
in my puppet manifest, which means the status functionality of this
init script is broken. The stop/restart functionality isn't any better:
it can fail silently, as shown above.

I'm not sure how one gets a corrupt pidfile under normal operation (a
race condition at boot, perhaps?) but I've seen it happen at least once
(on Ubuntu precise, shortly after a reboot).



More information about the pkg-ntp-maintainers mailing list