[Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

Dave Taht dave.taht at gmail.com
Thu Jun 16 02:35:05 UTC 2016


>     https://lab.nexedi.com/kirr/iproute2/blob/bd480e66/t/rtcache-torture
>     (also attached to this email)
>
> which reproduces the problem in several minutes just on one computer and
> retested it locally: I can reliably reproduce the issue on pristine
> Debian 3.16.7-ckt25-2 (on both Atom and Core2 notebooks) and on pristine
> 3.16.35 on Atom (compiled by me, since Debian kernel team has not yet
> uploaded 3.16.35 to Jessie).

I have been running this script on four different machines for hours
now without reproducing your bug on the 4.4 or later kernels. It does
trigger on a 3.14 kernel. (it helps to do a killall fping6 before
exiting!)

It does not seem to be happening on 4.4 or later. At one level, I'm
relieved - one last babel bug to worry about in openwrt (now 4.4
based), although one of the platforms I work on is still stuck at
3.18, as is the 3.14 c2 (for now).

At another level I still really, really, really wanted atomic updates
in babel, and was clearing the decks to make a run at the right
netlink stuff when I'd decided to confirm your bug existed or not in
my kernels. :(. Weirdly demotivating.


d at dancer:~/bin$ ssh root at pi3 uname -a
Linux pi3 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d at dancer:~/bin$ ssh root at pi2 uname -a
Linux pi2 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d at dancer:~/bin$ uname -a
Linux dancer 4.5.0-rc7-fqfi #1 SMP PREEMPT Mon Mar 7 16:04:17 PST 2016
x86_64 x86_64 x86_64 GNU/Linux

...

The odroid C2 has the bug.

d at dancer:~/bin$ ssh root at c2 uname -a
Linux c2 3.14.29-56 #1 SMP PREEMPT Wed Apr 20 12:15:54 BRT 2016
aarch64 aarch64 aarch64 GNU/Linux

BUG: Got unexpected unreachable route for 2226:3333:4444:5555::1: #
I'd changed the number
unreachable 2226:3333:4444:5555::1 from :: dev lo  src fd99::2  metric
0 \    cache  error -101

route table for root 2226:3333:4444::/48
---- 8< ----
unicast 2226:3333:4444:5555::/64 dev dum0  proto boot  scope global  metric 1024
unreachable 2226:3333:4444::/48 dev lo  proto boot  scope global
metric 1024  error -101
---- 8< ----

route for 2226:3333:4444:5555::1 (once again)
unreachable 2226:3333:4444:5555::1 from :: dev lo  src fd99::2  metric
0 \    cache  error -101 users 1 used 3


>
> It is always the same: the issue reproduces reliably in several minutes.
> And it looks like e.g.
>
>      ----- 8< ----
>      root at mini:/home/kirr/src/tools/net/iproute2/t# time ./rtcache-torture
>      PING 2222:3333:4444:5555::1(2222:3333:4444:5555::1) 56 data bytes
>      E.E.E.....E......E..E............E...E..
>      <more output from ping>
>
>      BUG: Linux mini 3.16.35-mini64 #14 SMP PREEMPT Sun Jun 12 19:41:09 MSK 2016 x86_64 GNU/Linux
>      BUG: Got unexpected unreachable route for 2222:3333:4444:5555::1:
>      unreachable 2222:3333:4444:5555::1 from :: dev lo  src 2001:67c:1254:20::1  metric 0 \    cache  error -101
>
>      route table for root 2222:3333:4444::/48
>      ---- 8< ----
>      unicast 2222:3333:4444:5555::/64 dev dum0  proto boot  scope global  metric 1024
>      unreachable 2222:3333:4444::/48 dev lo  proto boot  scope global  metric 1024  error -101
>      ---- 8< ----
>
>      route for 2222:3333:4444:5555::1 (once again)
>      unreachable 2222:3333:4444:5555::1 from :: dev lo  src 2001:67c:1254:20::1  metric 0 \    cache  error -101 users 1 used 4
>
>      real    0m49.938s
>      user    0m4.488s
>      sys     0m5.872s
>      ---- 8< ----
>
> The issue should not show itself with kernels >= 4.2, because there the
> lookup procedure does not take table lock twice, and /128 cache entries
> are not routinely created (they are created only upon PMTU exception).
>
> I'm running Debian testing on my development machine. Currently it has
> 4.5.5-1 (2016-05-29). I can confirm that /128 route cache entries are
> not created there just because a route was looked up.
>
> Kirill
>
>
> ---- 8< ---- (rtcache-torture)
> #!/bin/sh -e
> # torture for IPv6 RT cache, trying to hit the race between lookup,cache-add & route add
> # http://lists.alioth.debian.org/pipermail/babel-users/2016-June/002547.html
>
>
> tprefix=2222:3333:4444      # "whole-network" prefix for tests  /48
> tsubnet=$tprefix:5555       # subnetwork for which "to" route will be changed   /64
> taddr=$tsubnet::1           # test address on $tsubnet
>
> # setup for tests:
>
> # dum0 dummy device
> ip link del dev dum0 2>/dev/null || :
> ip link add dum0 type dummy
> ip link set up dev dum0
>
> # clean route table for tprefix with only unreachable whole-network route
> ip -6 route flush root $tprefix::/48
> ip -6 route add unreachable $tprefix::/48
> ip -6 route flush cache
>
> ip -6 route add $tsubnet::/64 dev dum0
>
>
> # put a lot of requests to rt/rtcache getting route to $taddr
> trap 'kill $(jobs -p)' EXIT
> rtgetter() {
>     # NOTE we cannot do this with `ip route get ...` in a loop, as `ip route
>     # get` first takes RTNL lock, and thus will be completely serialized with
>     # e.g. route add and del.
>     #
>     # Ping, like other usually connect/tx activity works without RTNL held.
>     exec ping6 -n -f $taddr
> }
> rtgetter &
>
> # do route del/route in busyloop;
> # after route add: check route get $addr is not unreachable
> while true; do
>     ip -6 route del $tsubnet::/64 dev dum0
>     ip -6 route add $tsubnet::/64 dev dum0
>     r=`ip -6 -d -o route get $taddr`
>     if echo "$r" | grep -q unreachable ; then
>         echo
>         echo
>         echo BUG: `uname -a`
>         echo BUG: Got unexpected unreachable route for $taddr:
>         echo "$r"
>         echo
>         echo "route table for root $tprefix::/48"
>         echo "---- 8< ----"
>         ip -6 -d -o route show root $tprefix::/48
>         echo "---- 8< ----"
>         echo
>         echo "route for $taddr (once again)"
>         ip -6 -d -o -s -s -s route get $taddr
>         exit 1
>     fi
> done



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org



More information about the Babel-users mailing list