Bug#545626: libcache-memcached-perl: FTBFS: tests failed

Niko Tyni ntyni at debian.org
Wed Sep 9 20:05:13 UTC 2009


On Tue, Sep 08, 2009 at 06:46:50PM +0200, gregor herrmann wrote:
> On Tue, Sep 08, 2009 at 11:48:03AM +0200, Lucas Nussbaum wrote:

> > > #   Failed test 'Should return fast on retry'
> > > #   at t/05_reconnect_timeout.t line 28.

Lucas, how reproducible is this for you?

It could happen on a very loaded system; I finally triggered it by
repeatedly running the same test in a loop with 'nice 20' and loading
the CPU simultaneously with a separate busy loop ("perl -e 'print while 1'").
I doubt the Grid'5000 systems ever get this slow.

I note that the firewall on Lucas's system blocks an outgoing connection
( -j REJECT --reject-with icmp-port-unreachable ) during this test:

[22815.042142] IN= OUT=eth0 SRC=131.254.202.159 DST=192.0.2.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45623 DF PROTO=TCP SPT=56798 DPT=11211 WINDOW=5840 RES=0x00 SYN URGP=0 

but I can't figure out the mechanism that makes the test fail in this way.

> > > #   Failed test 'OK'
> > > #   at t/100_flush_bug.t line 57.
> > > #          got: '0'
> > > #     expected: '1'
> > > # Looks like you failed 1 test of 7.

This is a race in the test: if the child server is slow enough, the
parent client will have failed at the point the child calls accept(),
and the child will block.

Adding 'sleep 1' in the child will make it fail reliably, and I suppose
adding something like 'sleep 2' in the parent should make it go away.

> This uninitialized $proto sounds interesting.

Yeah, netbase is clearly missing. I don't think that really matters as
IO::Socket::INET defaults to 'tcp' (and gets a hardcoded number from
the Socket module instead of using getprotobyname() since Perl 5.10.0).

> If I add "netbase" to the build deps, it still builds fine on i386,
> and now fails differently on amd64 (no warnings, only one failure,
> fails the 'other' test in this file):

>  #   Failed test 'Expected pause while connecting'
>  #   at t/05_reconnect_timeout.t line 24.

Can you reproduce this reliably?  If so, the output of 

 strace  -T -e trace=network perl -Iblib/lib -Iblib/arch t/05_reconnect_timeout.t

might give a clue about what's happening.

> If I understand the purpose of the test correctly it's about the
> second test (the reconnect stuff), so I guess we can just skip the
> first one or check for "> 0" or something.

The first connect is supposed to time out because 192.0.2.1 is unreachable
everywhere (see RFC 3330.)  At that point, the host is marked dead for
20 + int(rand(10)) seconds and further connection attempts should take
a fast path.
-- 
Niko Tyni   ntyni at debian.org





More information about the pkg-perl-maintainers mailing list