Bug#736258: acpid won't stop, won't upgrade (systemd) - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=736258

Sat Aug 2 21:55:42 BST 2014

On Sat, 02 Aug 2014, Russ Allbery wrote:
> Henrique de Moraes Holschuh <hmh at debian.org> writes:
> > Services being subject to a package update can be down for extended
> > amounts of time (think package update that requires manual intervention,
> > or even a full manual reconfiguration, etc), or may actually involve an
> > ABI break.  In both cases, you want that socket down.
> 
> Why?

Large time outs can cause bad side-efects on surprising ways.  If the socket
is down, you get an immediate connection refused reply, which short-circuits
the time out.

As this is a generic feature, we could be talking about openldap, or the
syslog daemon, or mysql, or a game server, or a DNS server...  The point is:
we don't know.

> Most upgrades will *not* be down for an extended period of time, so
> keeping the socket up is the right decision most of the time.  And if it

Agreed.  However, we need to be able to do *something* for those updates
that cannot be expected to be fast.  It is not a good idea to back oneself
into a corner on stuff like this, unless the cost of avoiding that corner is
really high.

> It seems to me like you get basically the behavior that you want if you
> leave the socket up and buffering.  But maybe I'm missing something?

Maybe *I* am the one missing something, as I have no knowledge of the
systemd specifics.  I fully expect someone that is knowledgeable in systemd
internals (either you or someone else) to set me straight if that's the
case.

But please don't rely on the socket backlog being full on your analysis, the
rate of connect() attempts might be low, causing the backlog to never get
full in the first place.  Also, unless systemd is setting that backlog queue
size manually, its size is unknown.

Are we also talking about datagram sockets? Those can be even worse re.
timeout behaviour.  Also, (and please correct me if I'm wrong) in the
ABI-break case, you risk messages in an unknown format arriving on the newly
updated service because the actual message content could get buffered.  This
is a failure mode probably never expected by the service's upstream.

FWIW, I agree fully that in the common case, it doesn't matter.  I do see
the usefulness of this feature, I do think it is worthwhile and that it
should be added, and this should be pretty clear already.

> > And you likely want it to have a configurable timeout that brings the
> > socket down when the service is not "allowed to reactivate" within that
> > timeout.  Maybe it is even already implemented like that in systemd.
> 
> Why not just let the kernel do this for you?

My reasoning is above.  It might be incorrect.

> > Please fix the immediate problem first, and come up with a patch to
> > disable the socket on "invoke-rc.d stop".
> 
> If that was an attempt to ask someone nicely to assist you with something
> that you think is important, you missed.

...

> anyone has any doubts about what you think should happen.  However, it
> turns out that I don't work for you, nor do the systemd maintainers, so
> you might want to try for a tone more appropriate for interacting with
> professional colleagues than a tone appropriate for ordering around your
> subordinates.

I apologise if I sounded pushy, it was not my intention.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh