[Pkg-utopia-maintainers] Bug#773525: Randomly excludes available connections [when there are too many?]

Thu Jul 16 22:56:32 UTC 2015

On 16/07/15 19:56, Dan Williams wrote:
> There really isn't any solution that I can think of, except serializing
> the requests in the client libraries.  Unfortunately, that's not a great
> way to go about it and it simply complicates the code on the client
> side.

In the medium to long term I think the only way is to have some sort of
queue of requests, or improve NM's D-Bus API so that things can be
batched (for instance getting properties of more than one AP per
round-trip, which would also make it faster).

> It's quite easy to run over 128 pending replies.

How many do you need? Is the answer in fact "arbitrarily many"?

With some appropriate benchmarks we might be able to increase the limit
by an order of magnitude or two, but I'm concerned that going back to
the 1K that NetworkManager historically used might be too many.

The problem is that if you have an unbounded number of requests
in-flight, the system dbus-daemon uses an unbounded amount of memory to
track them; and the system dbus-daemon is a shared resource acting on
behalf of various trusted and untrusted processes, so that would be bad.
The reasons it tracks them at all are so that unsolicited "replies" can
be rejected; so that if a client can call a method on a service, the
service allowed to reply; and so that if a service falls off the bus
without replying to all method calls it should have handled, dbus-daemon
can synthesize error replies immediately, rather than leaving the
clients to time out.

At the moment dbus-daemon also doesn't use clever enough data
structures, resulting in a CPU-consumption denial of service attack:
with a lot of pending replies and a lot of connections allowed, an
attacker can make dbus-daemon use a lot of CPU time. We dropped the
pending reply limit from 8K to 128 because there was a practical
denial-of-service attack with 8K (CVE-2014-3638,
https://bugs.freedesktop.org/show_bug.cgi?id=81053): simple method calls
could be made to take more than 25 seconds (the default timeout).

https://bugs.freedesktop.org/show_bug.cgi?id=83938 has some attempts at
better data structures, but it's significant code churn, so we're
unlikely to land anything like that in a dbus stable-branch. I'd be
happy to review a cleaned up implementation, but that isn't going to
help for distributions that aren't tracking bleeding-edge dbus.

kdbus' hard-coded limit was also 128 pending replies per sender, last
time I looked (although I think it was just coincidence that Alban
suggested the same number after experimenting with denial of service
attacks). It needs to track requested replies for basically the same
reasons as dbus-daemon, and also has to be fairly conservative with its
limits because it's allocating kernel memory. So moving to kdbus is
probably not going to save us from this.

> We do have the 'libnm' library with NM 1.0+ that uses GDBus all the way
> through, so if GDBus somehow manages to avoid this problem then great.
> Otherwise, we'll have the same problem in libnm too...

It sounds as though this is really an issue with the high-level design
of NM's D-Bus API, rather than the specifics of how the client library
is implemented, so I don't think GDBus is going to help.

    S