[Babel-users] QoS for system critical packets on wireless

Dave Taht dave.taht at gmail.com
Wed Jun 22 15:17:35 UTC 2011


The biggest fallout of the diffserv work I was trying was observing
that most packets fell into one of 3 buckets:

1) System control and 'MICE' are < less than 1% of all packets. Mice
includes a bunch of messages like ARP, NTP, UDP, and most of the icmp6
portion of the stack, in what I'm doing currently. Mice are
desperately needed for the network to continue to function.

2) unresponsive streams and udp (varying, but generally pretty low)
3) responsive tcp streams (closer to 99%)

So applying some level of packet prioritization to babel appears to
make sense - AFTER getting the excessive buffers under control, more
on that further below - so it can do it's work.

At the moment I am arbitrarily classifying babel packets into the CS6
diffserv class, which in the end, is then classifying the result into
the the 802.11e VI or VO classes. (VO being mildly more underused)

I haven't the faintest idea if this actually does anything to the
window in which frames are transmitted via multicast on wireless-b, g
or n. Theoretically the 802.11e VI and VO classes have their own tiny
timeslots to broadcast in separate from BE/BK.

There is no need to use diffserv, merely prioritization using some
other classification scheme be that outright
tc queues, or iptables marking would help, to jump the queues internally.

(I am getting a spectrum analyzer shortly)

There were several (long term!!!) thoughts here:

A) wireless devices are currently making heroic efforts (deep
buffering, exorbitant retries) to get packets through. Seeing a big
delay between transmit time and reception is more an indicator of
congestion than actual packet loss is right now. By the time you see
actual packet loss, the network has often already collapsed
completely.

B) Theoretically using a different 802.11e class reduces the heroism
by some unknown amount.

C) QoS, Packet marking and prioritization of any sort makes babel
control packets jump closer to the head of the internal queues of the
transmitting clients, thus speeding up routing change propagation. By
all means, don't shoot mice, but elephants.

Once you do all this stuff, packet loss comes closer to being a
measure of actual problems in the air, instead of deep in the stack.

D) The ECN bit could be used to indicate congestion on links that
aren't losing packets but are experiencing congestion.

Moving back to item C, I would like to recomend to babel users on
Linux (at least) that they try:

Reduce txqueuelen on their ethernet and wireless devices by a lot! I'm
using 4-16 at present.
Reduce driver buffering by a lot! I cut one driver from 512 packets
buffered to 3, which made voip feasible.

I note that reducing driver buffering currently reduces wireless-n in
single threaded situations by a lot, however, in more real-world
scenarios it's hardly noticable - and can be fixed one day, after we
get better classification and feedback mechanisms. And by all means,
if you are using a wireless-n device on a mostly-g network, excessive
buffering hurts a lot.

For wired, we are also using ethtool to reduce dma tx values that are
often set optimally for GigE (64-256) to what is optimal for the real
world of far less than 100Mbit, which appears to be in the range 4-16,
where possible. The correct values need to be derived from further
experimentation

Once these changes are made, Qos actually starts to have some effect
on overall network performance again. Without reducing buffer sizes
dramatically, it doesn't.

Apply QoS and packet priorization to system critical 'mice' packets at
the very least. Rate limit but exclude them from being shot down by
other bandwidth control mechanisms. Few people are doing QoS to ipv6
packets at all, and many icmp messages (in addition to babel's udp
multicast) should be prioritized...

I'm pretty sure, based on the results gathered thus far, that this
will improve the quality of most mesh networks out there
and I'd love for more people to be trying these things on a wider
scale and let us know what the effects are.

I've also written elsewhere about the effect of multicast traffic on
wireless and am trying hard to stop bridging gigE (1000Mbit) and
Wireless (a,b,g,n) together wherever possible, as the huge disparity
between the multicast rates is not accounted for in any QoS scheme
available to date in Linux. Addressing large scale usage multicast
effectively is going to take a great deal of work, and even arp can
cause headaches.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com



More information about the Babel-users mailing list