[Soc-coordination] Report 5 - Pluggable Acquire System for APT

Bogdan Purcareata bogdan.purcareata at gmail.com
Mon Jul 30 19:02:55 UTC 2012


Codename: apt-fetcher
Mentors: Michael Vogt, David Kalnischkies
Project proposal page: [0]
Project design page: [1]

What was planned:
- an enhanced format for the sources.list file(s) and a public parser
- a pluggable backend to handle metadata download
- a generic plugin model definition
- acquire logic and communication security for the backend
- integration with another packaging tool - apt-file
- user interface for the parser
- doxygen documentation, tests
What was done:
- the format for the sources.list file(s) and the public parser
- a pluggable acquire framework
- integration of this framework with the present APT acquire module
- generic plugin model definition, both for Release files, and for the
other index files
- unit tests for the parser and the acquire framework

The main challenge of this project proved to be a complete and
thorough understanding of the APT code. APT was one of Debian's
initial applications and is a very significant part of the distro. Its
code has passed a long series of revisions and refactoring. The image
that I've made myself about the APT architecture, at the point where I
wrote the application, was different than the actual one. There was no
need to build a separate backend for metadata retrieval - APT
implements an acquire module just for that. The implemented pluggable
framework handles passing the information from the sources.list to
this acquire module, so the files can be downloaded. This way the
current apt-get functionality was easy to maintain, but I found a
challenge in making the right adjustments in the acquire module to
support the integration with the pluggable framework.

Here is a summary of weeks 7 through 11 of the coding period:

What I've done:
- integrated the framework with apt-get update
- successfully downloaded the index files - Packages, Sources,
Translations, Contents
- refactored the framework to include pkgAcquire::Item implementation
- separated the implementation into several individual files
What problems I've run into:
- designing the framework architecture
- throughly understanding the acquire module code
What I plan to do by the final term:
- refactor framework code
- improve acquire methods and acquire methods interfaces
- user interface for the parser
- integration with another packaging tool - e.g. apt-file

A month ago, at the 3rd report's milestone, I had implemented the
public parser for the enhanced sources.list file and the pluggable
acquire framework. This framework was implemented up to the point
where it provided IndexTarget objects - locations of index files in
the Debian Archive - to the present acquire module. Tests for these
APT components were also written and they were functional. The next
step was to integrate the apt-fetcher with apt-get update.

For the next two weeks, before the Google Mid-Term, I studied the APT
code. I've noticed how the fetcher object - the main class for
metadata and package acquisition - works. Most importantly, I noticed
that the whole download and processing is separated from the type of
the handled files, using generic interfaces. The abstract class to
define a downloadable index file is, in the acquire module,
pkgAcquire::Item. This class defines methods to call upon the fetch
start of an index file and upon its download failure or success. This
item class should be subclassed into more specific classes,
corresponding to particular files in the Debian Archive, with specific
behaviors.

In the present acquire module there are several subclasses already
implemented - they are described in more detail on the design page
[1]. My initial belief was that these classes would be sufficient to
support any type of metadata file. I integrated the plugin for the
Contents with this acquire module, and used the pkgAcqIndex class to
handle a Contents index file. After replacing the sourceslist parser
with my own, and moving the ListUpdate() - the main metadata retrieval
algorithm - in the framework, I was able to provide apt-get update
default functionality, with support for Contents as well. The code was
correctly downloading and placing the files in the filesystem, it was
using pdiffs when possible and it was using decompression after
fetching the remote compressed index files.

APT uses pdiffs as a technology for downloading and keeping files up
to date - i.e. the Debian Archive holds a hierarchy of diff files that
are sufficient to update a local version of an index file, should this
previous version exist. One thing I've learned is that the acquire
module will first try to fetch the index files using pdiffs. If this
process fails, the whole file is downloaded via http or another copy
method.

After the Google Mid-Term, I noticed that, although the index files
were downloaded correctly, some errors occured. The pkgAcqIndex class,
which I was using to download Contents files as well, was designed to
be used only for Packages, Sources and Translations. Specifically,
after the index files were downloaded and decompressed, they were
parsed to check the presence of a "Package:" tag. The format of the
Contents file is different, and doesn't include such a section - and
probably the new metadata files that will be included in the Debian
Archive won't adhere to the format either. To use the generic
pkgAcqFile class, which doesn't include the decompression stage, was
not an option either.

So the best solution was to make the implementation of the
pkcAcqure::Item subclass part of the implementation of the
acquireIndexPlugin. This way, the plugin developer would completely
define how the index file should be downloaded. He can define the
initial preparations for the download, what to do when the file is
done downloading, what to do if this process fails. The developer can
use the different acquire-methods of the acquire module, such as
applying pdiffs, decompressing, checking the signature or copying the
files.

After redesigning the framework to include this feature, the apt-get
update process worked without any errors. For the Contents files, I
have defined a pkgAcqContents that resembles most of the pkgAcqIndex
code, without the parsing. I've also made some modification in the
acquire module, to make use of the acquire framework. A new purpose
for it is to build a pkgAcquire::Item for a specific type. The
framework chooses the right plugin for that, builds the item and adds
it to the pkgAcquire fetcher.

At this point, I would estimate that the project purpose is mostly
supported, or at least proven. The remaining steps of the project are
the refactoring of the new code and improvements on the existing one.
The remaining development will focus on supporting all the metadata
types in the Debian Archive and throughly testing the functionality.
One thing that should be taken into account when refactoring is the
ease to develop new plugins by developers that are not familiar with
the APT code. If possible, classes and interfaces should be made
easier to comprehend.

My contributions to the APT package can be found in the repo [2]. The
implemented files are listed from [3] to [11].

Bogdan Purcareata

[0] http://wiki.debian.org/SummerOfCode2012/Projects#Pluggable_acquire-system_for_APT
[1] http://wiki.debian.org/BogdanPurcareata/PluggableAptBackend
[2] https://launchpad.net/apt-fetcher
[3] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/sourceslist-parser.h
[4] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/sourceslist-parser.cc
[5] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/test/libapt/sourceslist-parser_tester.cc
[6] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/acquire-framework.h
[7] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/acquire-framework.cc
[8] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/test/libapt/acquire-framework_tester.cc
[9] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/deb/debplugins.h
[10] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/deb/debplugins.cc
[11] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/files/head:/apt-pkg/plugins/



More information about the Soc-coordination mailing list