[Pkg-swan-devel] Bug#781209: postinst execution order bug confuses systemd

Faidon Liambotis paravoid at debian.org
Thu Mar 26 02:29:02 UTC 2015


Package: strongswan-starter
Version: 5.2.1-5
Severity: grave

strongswan-starter currently ships:
 - /etc/init.d/ipsec
 - /lib/systemd/system/strongswan.service

With the latter containing Alias=ipsec.service and also calling the
ipsec binary with --nofork as an (implicit) Type=simple unit. This is
all a bit confusing at start but pretty sane in general and the
strongswan rename is a nice move (and also consistent with Ubuntu).

The package's postinst, however, is buggy: it does not use
dh_installinit but calls invoke-rc.d ipsec manually. That would have been
fine, but invoke-rc.d ipsec is called *before* the
dh_systemd_enable/deb-systemd-helper bits.

This means that "invoke-rc.d ipsec start" runs before the systemd unit
is properly installed, which in turn confuses the hell out of systemd
(as, among others, it expects a Type=simple unit), as evidenced by the
following commands run in sequence:

# apt-get install strongswan
[...]
# systemctl status strongswan
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf
   Loaded: loaded (/lib/systemd/system/strongswan.service; enabled)
   Active: active (running) since Thu 2015-03-26 00:50:42 UTC; 6min ago
   CGroup: /system.slice/ipsec.service
           ├─5150 /usr/lib/ipsec/starter --daemon charon
           └─5151 /usr/lib/ipsec/charon --use-syslog

[note how starter has been called without --nofork and there is a CGroup called
"ipsec.service", despite the unit called "strongswan.service"]

# systemctl restart strongswan
# systemctl status strongswan
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf
   Loaded: loaded (/lib/systemd/system/strongswan.service; enabled)
   Active: inactive (dead) since Thu 2015-03-26 01:00:59 UTC; 2s ago
  Process: 5783 ExecStart=/usr/sbin/ipsec start --nofork (code=exited, status=0/SUCCESS)
 Main PID: 5783 (code=exited, status=0/SUCCESS)

Mar 26 01:00:59 curium systemd[1]: Started strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf.
Mar 26 01:00:59 curium ipsec_starter[5783]: Starting strongSwan 5.2.1 IPsec [starter]...
Mar 26 01:00:59 curium ipsec_starter[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start
Mar 26 01:00:59 curium ipsec[5783]: Starting strongSwan 5.2.1 IPsec [starter]...
Mar 26 01:00:59 curium ipsec[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start
Mar 26 01:00:59 curium ipsec[5783]: starter is already running (/var/run/starter.charon.pid exists) -- no fork done

[note the inactive/dead after a restart!]

# ps aux |grep ipsec
root      5150  0.0  0.0  17144   968 ?        Ss   00:50   0:00 /usr/lib/ipsec/starter --daemon charon
root      5151  0.0  0.0 1275680 5416 ?        Ssl  00:50   0:00 /usr/lib/ipsec/charon --use-syslog

Those are lingering/orphan processes, unmanaged by systemd. This won't
happen every time -- it's a race but reproducible, I've managed to
recreate it 5 times here already on two different servers. 19 times out
of 20, no process will stay behind; ipsec won't be running at all, which
is also a bug.

The remaining 1 time, though, the service stays out of systemd's control
and remains unmanageable; systemd thinks it's dead but it really is
running. This is a) confusing to the sysadmin b) means that reloads will
fail, c) means that a package removal won't actually stop the daemons,
d) that tools such as puppet will try to restart it again and again but
failing to do so.

More importantly, though, it triggers a secondary bug in systemd itself.
Continuing right from the execution path above:

# ipsec stop
Stopping strongSwan IPsec...
# grep systemd /var/log/syslog | tail -3
Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive().  Aborting.
Mar 26 01:02:15 curium systemd[1]: Caught <ABRT>, dumped core as pid 6916.
Mar 26 01:02:15 curium systemd[1]: Freezing execution.
# systemctl status
^C

At that point, the system barely works; systemctl etc. are not
responding.

I'll be filing the latter separately against systemd. However, the
strongswan's postinst is buggy nevertheless and creates a situation
uncommon enough to trigger this cascaded failure.

Regards,
Faidon



More information about the Pkg-swan-devel mailing list