[Aptitude-devel] Acquire considered non-threadsafe

Daniel Burrows dburrows at debian.org
Sun Nov 15 20:42:38 UTC 2009


  For the last few years, aptitude has been running apt's Acquire queue
in a background thread, with the idea that doing so would provide more
responsiveness in the UI.  I had assumed (yes, you can slap me now) that
this would be OK, since a download queue "shouldn't need" to access
global resources.

  However, today I happened to be reading through the Acquire code to
track down an unrelated aptitude problem, and I noticed that in fact the
Acquire code is completely non-threadsafe: it is never safe to use it in
a different thread from any other libapt function.  (please note: by
"threadsafe" I mean "safe to use in a multi-threaded program", not the
more stringent "safe to access from multiple threads at once", which
isn't necessary for Acquire)

  I haven't performed a full audit, but the Acquire code *at least* uses
the global _error variable:

  (1) If _error->PendingError() is true, the download loop will abort
      and return Failed.  This means that if some other thread inserts
      an error into the global error queue, the Acquire process will
      fail for no obvious reason.

  (2) Various Item subclasses add errors in their status callbacks,
      which could cause race conditions with any other thread that
      inserts errors into the global queue.  At best, this will lead
      to an error item being leaked and dropped from the queue.  Also,
      several Item subclasses test for the existence of errors before
      deciding whether to proceed.

  I'm guessing that these haven't shown up until now as a problem
because you need very specific timing for them to matter, and most
users probably don't do anything that would cause trouble while aptitude
is downloading.  But these are still definitely bugs IMO.

  In particular, since the Pulse() callback could invoke arbitrary
user code (and indeed will normally drive the program's GUI while it's
running), (1) is a problem even for single-threaded apt clients.  Any
action the user takes that generates errors in the global queue has the
potential to interrupt an ongoing download.  I guess in the
single-threaded case, you can flush the error queue on the way out of
Pulse(), but I still don't like it.


  I'm not sure what the best way to deal with this is.  I'd really like
to redesign Acquire to be thread-safe (and not to suck in various other
ways), but there would be a significant chance of breaking existing
users of the class.  For instance, a custom Item object that relied on
inserting errors to ask the download to fail would no longer work
properly.

  One option is to design more sensible behavior, then allow client code
to request it on an opt-in basis.  Let users of the Acquire class enable
a "threadsafe" mode that changes its behavior to be more sensible: throw
out the global error queue for a start, but I'll want to audit all this
code to ensure that it's safe for multithreaded programs.  This might
also involve writing new versions of existing routines to pass error
information back explicitly.  For instance, I notice that we need to
test _error to find out whether pkgRecords::Lookup succeeded.  It should
really return an extra value indicating whether the lookup succeeded, or
just return a pointer that's NULL if the lookup failed.

  There are some other very nice options involving rewriting or
reimplementing the "download queue" logic (i.e., just pkgAcquire), but
getting threadsafety does require changing the behavior of
pkgAcquire::Item in any event.  I'm assuming that we don't want to have
two parallel implementations of this functionality, and that
non-backwards-compatible changes are considered unacceptable, in which
case the opt-in approach is the only solution I can think of.


  It would also be good, IMO, to modify the error class to be threadsafe
no matter what.  A simple mutex could handle that.


  Once I get the next major release of aptitude out, I want to take a
crack at this; I hate knowing that the program contains obscure race
conditions.

  Daniel



More information about the Aptitude-devel mailing list