[Babel-users] the costs of periodic disassociation in conventional ap/sta mode

Dave Taht dave.taht at gmail.com
Fri Jun 17 03:05:31 UTC 2016


On Thu, Jun 16, 2016 at 3:46 PM, Juliusz Chroboczek
<jch at pps.univ-paris-diderot.fr> wrote:
>> In the new lab I ended up connecting up a bunch of machines in sta mode
>> over wpa [...] It doesn't help that I'm also trying to make a major
>> change in how wifi is queued underneath...
>
> You didn't mention that you did this while running a production network on
> a heterogeneous mixture of buggy and less-buggy hardware, with different
> kernel versions, different versions of the wireless drivers (some very
> experimental), and different versions of babeld (some badly obsolete, some
> experimental).  There's also a wireless bridge introducing further
> uncertainty, but you don't know where it is (it answers pings, but you
> cannot remember where you put it).
>
> Yeah, sounds like you, Dave ;-)

As always, thx for putting up with my s**t.

I should probably have a standard disclaimer that I usually do
all-up-testing by default, because it was the fastest way to get to
the moon. :) Others work step by isolated step at honing their
subsystem, and my mission tends to be a one man "red team" at a
whole-system level, to combine as many real-world variables as
possible, to find the "unknown unknowns".

... and push out fixes or bug reports in the general directions of the
right people, in the hope they can be addressed  and integrated before
the next round of massive testing. This lab is as complex as it is, on
purpose; that might seems sick, or twisted, to some. There are plenty
of saner labs out there...

But: I do, then actually, sometimes, break out the full scientific method.

Your reminder that I had *way too many variables* under test did cause
me to reboot and try some more stuff in isolation.

A) The periodic disassociation I was having appears to be a wifi
driver bug, not an interaction with network manager, for example.
(categorizing the the effects of disassociation was still a good idea
- the behavior I was seeing is commonly reported in the field)

[1325508.233042] wlp2s0: associated
[1325553.145589] wlp2s0: deauthenticating from 04:f0:21:1f:36:e2 by
local choice (Reason: 3=DEAUTH_LEAVING) # reported elsewhere
[1325556.728971] wlp2s0: authenticate with 04:f0:21:1f:36:e2
[1325556.734118] wlp2s0: send auth to 04:f0:21:1f:36:e2 (try 1/3)
[1325556.736307] wlp2s0: authenticated

After some new devices associated, it started happening again.

Have a major refresh of that driver waiting for one last piece to land....

B) I think I have found a genuine new-ish kernel bug by implementing
the ipv6 "atomic replace" code I'd tried to put together (after just
trying to make it work for ipv6 only today). I can so hose the routing
table that after killing babel (and leaving some routes behind), that
a

ip -6 route flush all

No longer can flush the whole table. After I update the build to
net-next + toke's airtime fair patchset, I'll
be in a position to pursue that further on the apu2s and laptops.

C) Your unicast patch looks good (less code, too!), and I'll give that
a shot on the systems that I have explicitly been turning powersave
off on on, after refreshing the kernels on everything.

I immediately slammed it on the pi3, took the ethernet interface down,
got an unreachable route for a few seconds (?), then, the "right" wifi
interface  picked it up. too early to tell...

D)

I have to go back and reflash the getchips, also, but I don't think
IPV6_SUBTREES made it into their build yet, either. Will go back and
poke into the usb failover thing harder too, with your prior patches
on the interface check, and a few more printfs....

It is looking increasingly like I won't get to shncpd before july 17th.

>> A) Powersave enabled caused stas to drop off the net by missing multicast.
>
> Noted.
>
>> good fixes for this problem include [...] having babel be aware it is in
>> powersave mode and using a bit of unicast, or something, to keep itself
>> alive.
>
> Patch attached.  You test, we speak.  Agree?
>
>> B) Network Manager triggered scans were devastating[1],
>
> Look, Dave, I'm really full of good will.  I once spent a week of my life
> looking at systemd, then I uninstalled it from all my machines.  I tried
> to do the same with NM, but gave up after a few days.
>
> I'll try again some day.  Just not now.
>
> -- Juliusz
>
> diff --git a/message.c b/message.c
> index fdc1999..6b4eefc 100644
> --- a/message.c
> +++ b/message.c
> @@ -1608,6 +1608,7 @@ send_ihu(struct neighbour *neigh, struct interface *ifp)
>      int ll;
>      int send_rtt_data;
>      int msglen;
> +    int rc;
>
>      if(neigh == NULL && ifp == NULL) {
>          struct interface *ifp_aux;
> @@ -1638,12 +1639,8 @@ send_ihu(struct neighbour *neigh, struct interface *ifp)
>      rxcost = neighbour_rxcost(neigh);
>      interval = (ifp->hello_interval * 3 + 9) / 10;
>
> -    /* Conceptually, an IHU is a unicast message.  We usually send them as
> -       multicast, since this allows aggregation into a single packet and
> -       avoids an ARP exchange.  If we already have a unicast message queued
> -       for this neighbour, however, we might as well piggyback the IHU. */
>      debugf("Sending %sihu %d on %s to %s.\n",
> -           unicast_neighbour == neigh ? "unicast " : "",
> +           "unicast ",
>             rxcost,
>             neigh->ifp->name,
>             format_address(neigh->address));
> @@ -1663,44 +1660,24 @@ send_ihu(struct neighbour *neigh, struct interface *ifp)
>         optional 10-bytes sub-TLV for timestamps (used to compute a RTT). */
>      msglen = (ll ? 14 : 22) + (send_rtt_data ? 10 : 0);
>
> -    if(unicast_neighbour != neigh) {
> -        start_message(ifp, MESSAGE_IHU, msglen);
> -        accumulate_byte(ifp, ll ? 3 : 2);
> -        accumulate_byte(ifp, 0);
> -        accumulate_short(ifp, rxcost);
> -        accumulate_short(ifp, interval);
> -        if(ll)
> -            accumulate_bytes(ifp, neigh->address + 8, 8);
> -        else
> -            accumulate_bytes(ifp, neigh->address, 16);
> -        if(send_rtt_data) {
> -            accumulate_byte(ifp, SUBTLV_TIMESTAMP);
> -            accumulate_byte(ifp, 8);
> -            accumulate_int(ifp, neigh->hello_send_us);
> -            accumulate_int(ifp, time_us(neigh->hello_rtt_receive_time));
> -        }
> -        end_message(ifp, MESSAGE_IHU, msglen);
> -    } else {
> -        int rc;
> -        rc = start_unicast_message(neigh, MESSAGE_IHU, msglen);
> -        if(rc < 0) return;
> -        accumulate_unicast_byte(neigh, ll ? 3 : 2);
> -        accumulate_unicast_byte(neigh, 0);
> -        accumulate_unicast_short(neigh, rxcost);
> -        accumulate_unicast_short(neigh, interval);
> -        if(ll)
> -            accumulate_unicast_bytes(neigh, neigh->address + 8, 8);
> -        else
> -            accumulate_unicast_bytes(neigh, neigh->address, 16);
> -        if(send_rtt_data) {
> -            accumulate_unicast_byte(neigh, SUBTLV_TIMESTAMP);
> -            accumulate_unicast_byte(neigh, 8);
> -            accumulate_unicast_int(neigh, neigh->hello_send_us);
> -            accumulate_unicast_int(neigh,
> -                                   time_us(neigh->hello_rtt_receive_time));
> -        }
> -        end_unicast_message(neigh, MESSAGE_IHU, msglen);
> +    rc = start_unicast_message(neigh, MESSAGE_IHU, msglen);
> +    if(rc < 0) return;
> +    accumulate_unicast_byte(neigh, ll ? 3 : 2);
> +    accumulate_unicast_byte(neigh, 0);
> +    accumulate_unicast_short(neigh, rxcost);
> +    accumulate_unicast_short(neigh, interval);
> +    if(ll)
> +        accumulate_unicast_bytes(neigh, neigh->address + 8, 8);
> +    else
> +        accumulate_unicast_bytes(neigh, neigh->address, 16);
> +    if(send_rtt_data) {
> +        accumulate_unicast_byte(neigh, SUBTLV_TIMESTAMP);
> +        accumulate_unicast_byte(neigh, 8);
> +        accumulate_unicast_int(neigh, neigh->hello_send_us);
> +        accumulate_unicast_int(neigh,
> +                               time_us(neigh->hello_rtt_receive_time));
>      }
> +    end_unicast_message(neigh, MESSAGE_IHU, msglen);
>  }
>
>  /* Send IHUs to all marginal neighbours */



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org



More information about the Babel-users mailing list