Bug#841662: libserver-starter-perl: test suite sometimes times out

Niko Tyni ntyni at debian.org
Wed Nov 2 22:02:51 UTC 2016


On Tue, Nov 01, 2016 at 09:24:06PM +0100, gregor herrmann wrote:
> On Sun, 30 Oct 2016 20:03:49 +0200, Niko Tyni wrote:
> 
> > Given it fails somewhat regularly on both ci.debian.net and
> > tests.reproducible-builds.org, possibly a faster machine would improve
> > the chances of reproducing it.  Just getting the log of 'strace -f
> > -olog prove -l t/01-starter.t' when it locks up would help tremendously,
> > but I ran it for two hours or so like that without a single lockup.
> 
> I failed as well on Sunday but today I succeeded.
> Attached is the output of
> 
> while :; do strace -f -olog prove -l t/04-starter-dir.t t/05-killolddelay.t t/06-autorestart.t || break ; done

Oh awesome, thanks! Note to self: next time ask for time stamps too
(strace -ttt or so). But this will do quite fine :)

The problem (at least the one visible in this trace) seems to be
related to Test::TCP. The test_tcp() call in t/06-autorestart.t
finds an empty port with Net::EmptyPort, then passes it to both the
client and the server code. The server starts up in a child process in
Test::TCP::start(), but gets EADDRINUSE when binding the listener socket
for some reason.  The parent process in Test::TCP::start() then hangs
in Net::Empty::wait_port(), waiting for the port to become available
before calling the client code but always getting ECONNREFUSED.

The Server::Starter tests should probably specify a max_wait parameter
to test_tcp(). That should fix at least these hangs, probably in
exchange for test failures.

However, I'm not sure what causes the EADDRINUSE value.  Either the
kernel keeps the port reserved even after it got closed (Net::EmptyPort
finds a port by binding to one and then closing it immediately), or some
unrelated process steals the port in between, possibly for a non-listener
socket (hence ECONNREFUSED).

The latter explanation feels somewhat more plausible, particularly
as the hangs seem to happen more on busy hosts. This should be
easy-ish to demonstrate but I'm out of time for tonight.

I'm not totally convinced this is the same hang I was seeing in my
earlier investigations fwiw, but it's at least a step forward :)
-- 
Niko



More information about the pkg-perl-maintainers mailing list