[Babel-users] A few more comments on the BATMAN routing protocol

Mon Sep 1 12:02:42 UTC 2008

Hi,

On Mittwoch 27 August 2008, you wrote:
> >> My claim that there exist topologies in which average-case convergence
> >> of BATMAN is exponential in the number of hops has been confirmed by
> >> two of the BATMAN developers.  I still believe this to be a very
> >> significant flaw of BATMAN.
> >
> > Packet loss increases also the count of OGMs that trigger a route switch
> > decreases.
>
> I realise that.  The trouble is that the time needed to reconverge is
> proportional to the product of the link qualities, and not to the sum
> of the link qualities, as is the case in Babel and in OLSR.
>
> > Now let's come to the worst case. Again two paths that are
> > non-overlapping. One is terrible, 99% loss. The other is perfect, no
> > loss.
>
> As I mentioned in my point 3, the packet loss, as measured by BATMAN,
> is not realistic in the presence of link-layer AQM.  As you know,
> IEEE 802.11 does perform link-layer AQM for unicast packets.
AQM == Active Queueing Management ? Sorry, I did not know this. I know that 
802.11 retransmits (usually up to 7 times) in case of absent ACKs and most 
implementation performs some non-standardized Ack-based rate control. 

All mesh protocols I am aware of use broadcast-based link probing for 
estimating the link quality. I agree that this is not very precise and also 
does not indicate much about the achievable bandwidth. But the alternatives 
are also difficult. Performing unicast measurements introduces lots of 
overhead especially in areas of high node density and relying on estimated 
802.11 transmit rates requires interaction with the MAC layer and still does 
not relieve from the need for casual unicast traffic between all neighboring 
nodes. 
I am currently experiencing with unicast probes, hoping to find an acceptable 
mixture of selective unicast probing and reusing probetraffic for flooding of 
topology information. 

>
> A 10-hop route with each link having 20% loss will be measured by
> BATMAN as a route with 90% loss.  However, with 7-persistent per-hop
> AQM (the IEEE 802.11 default), such a route has an effective loss rate
> of roughly 10^-4, and is therefore quite usable by standard transport-layer
> protocols such as TCP.  (In case this is not clear to you -- this is
> brilliantly exposed in the original ETX paper.)
>
> Hence, my claim that BATMAN converges in exponential time stands, even
> if you restrict yourself to routes usable by TCP.

And we have not denied your claim. But the evolution of BATMAN has continued 
after batmand-0.2 and what is described in the draft - so did the OLSR which 
is used in todays Freifunk networks. For example, the BMX version by default 
sends each OGM twice (or more often if wanted, my first observations was that 
twice seems sufficient and more than twice does not pay off). Taken your 
example of 10 hops with a packet loss of 20% and a resulting Path Error Rate 
of 90 % 
transferred to the scenario of sending OGMs twice: 
(Link-OGM-Error-Rate) LOER=(LPER)²=0.04	=> 
(Path-OGM-Error-Rate) POER = 1-( (1-LOER)^hops ) = 0.34 = 34%

So statistically even after 10 hops 66 of 100 OGMs will be received. 
Even after 20 hops more than 55 of 100 OGMs could be received. 

But the generally cool thing is: The more weak hops exist between two nodes 
(and therefore the less important a good estimation of the path quality 
becomes) the less OGMs are eating up the available bandwidth and processing 
resources.

To sum up: Yes, it is exponential. Its not a bug, its a feature. This is 
wanted as long as sufficient OGMs remain to maintain usable routes.

>
> > All versions of Batman benefit from the fact that they don't produce
> > much protocol overhead (small amount of data that needs to be
> > communicated, OGM flooding is only flooded exclusively via the best
> > routing path to a destination).
>
> Given a network with n nodes and e edges, BATMAN sends O(n^2) packets
> per unit of time, Babel sends O(n^2) messages, and OLSR sends between
> O(e) and O(e.n) depending on the MPR redundancy.
>

I think the packets BATMAN sends per node (without packet loss) should be 
O(n).
Even though, this calculation ignores the packet loss which has been 
emphasized so much above. Without being an expert. If you want to express the 
theoretical effort of large BATMAN networks WITH PACKETLOSS with the 
O-notation, I guess that it rather ends up in something like O(log n) per 
node. Simply because the larger a network becomes, the greater the average 
distance between nodes in terms of hops becomes and therefore the greater the 
average end-to-end OGM loss. ( I think Roman Steiner also mentioned this 
effect in his simulations which he has described here 
http://open-mesh.net/Members/roman/report.pdf )

With kind of provoking example: Adding 500 nodes in the distance of 500 hops 
(with a Link OGM Error Rate of 4% and an originator interval of 1 second) 
causes the nodes at the other side to send 21 OGMs more  per year.

ciao,
axel