[Soc-coordination] port bootstrap build-ordering tool report 2

Johannes Schauer j.schauer at email.de
Sun Jun 17 20:06:24 UTC 2012


Hi,

a copy of this email has been sent to
debian-bootstrap at lists.mister-muffin.de and is also available at
http://blog.mister-muffin.de/2012/06/17/port-bootstrap-build-ordering-tool-report-2

Diary
=====

June 4
------

I added the first version of basenocycles.ml to git. Given an initial set of
cross built packages, it tries to compile as much as possible on the resulting
system in multiple rounds.

June 5
------

During June 3, I discovered and error in my program that would only come up
when using the Debian Sid package lists as the input:

	Fatal error: exception Assert_failure("common/edosSolver.ml", 610, 11)

On this day, June 5, I wrote a [minimal test case][1] for this problem.

The same day, Pietro [figured out][2] that this is a bug in dose which will be
fixed in the next release.

Begin writing code to figure out how important a binary package is for the
further build process.

Try to use Depsolver.edos_install to find out what packages are needed to make
debhelper available.

Restructure basenocycles.ml, exclude source packages that already have been
built, still trouble with already existing binary packages and
Cudf.mem_installed, comment stuff better.

June 6
------

I wrote some crude code (only estimate, not correct, fixed later) that would
give a rough overview of how often a given binary package is directly required
as a build dependency.

Debhelper came out as the most needed package. It is architecture:all, so it
does not have to be built but it has unfulfilled runtime dependencies. To make
those packages available, 13 (actually 11, fixed later) packages have to be
compiled on Ubuntu Natty. But those packages all (except gettext) require
debhelper itself to be built. The first dependency cycle.

This dependency cycle (actually, the 12 cycles) can be broken by either cross
compiling those source packages or by making them build without debhelper. One
goal of the program is to help decide what the easier option is, but this is
not yet implemented.

To play around a bit, I created the possibility to specify a list of packages
that are additionally to the minimal set of cross compiled packages also cross
compiled. I added the 13 packages found above to the list, thus making the
binary packages they build available. This made debhelper available in the
system.

As a result, 1625 out of 3339 source packages can be built with just a minimal
build system (priority:essential packages plus build-essential) and debhelper
available.

The next package that blocks the most other source packages from being built is
cdbs. The next nine packages in that list also require cdbs so it seems to be
the next important package to be available.

Pietro's suggestions make me:

     - do not open BootstrapCommon but ExtLib, Common, Algo, Debian
     - do proper option parsing and logging
     - use Debcudf.ignore_essential = true
     - do Debcudf.init_tables (binlist at srclist)
     - use @ with shorter list first
     - use more List.rev_append instead of @
     - use CudfAdd.who_provides to find if a package is available

June 7
------

[Pietro][3] and [Patrick][4] suggest that for solving the debhelper cycles, one
could provide a minimal debhelper version so that the above list of 12 packages
can be built without debhelper.

I try to figure out how to get a list of packages that are missing to make a
package installable/buildable. This functionality should be provided in dose
but I fail to find it.

June 8
------

Lacking a solution of the problem of June 7, I write [a mail][5] to Pietro.

I start my first graphs in ocaml using the ocamlgraph library.

The graph I generate, starts off at a binary package. For each binary package
it connects new vertices as its runtime dependencies. If a binary package is
not arch:all and also not yet otherwise compiled, its source package is also
added.

The result is a graph in which set of source packages in it will make the
initial package available, if those source packages would be cross compiled.

The graph is extended further than the source packages.

June 9
------

I refine a couple of functions, make univ_get_pkg_by_name return the package
with the highest version number.

I wrote a rather lengthy (1027 words) [email][6] to the list that explains my
status as of this day.

I can create graphviz dot files with ocaml, can create node and edge types and
create the graph by an imperative pattern that I saw a lot in Pietro's code.

Disjunctions are not yet handled correctly (see mail from June 8).

The graphs generated look like the following:
[http://mister-muffin.de/p/8nyc.png](http://mister-muffin.de/p/8nyc.png)

June 11
-------

I write a [test case][7] which shows how CudfAdd.who_provides doesnt find
virtual packages.

Automate the process of finding the packages that, if cross compiled, would
make another package available.

Add more predicates (identifying source packages) and improve input file
reading code.

Move build_compile_rounds which compiles as many source packages as it can in
multiple rounds on a given minimal system a toplevel function and thereby
abstract it more.

Create a rudimentary text based menu to choose different actions to take for an
analysis.

Start writing an extended version of simple_dependency_graph for deeper
analysis.

Use xdot to show graphs from the text menu. Allow saving those graphs to a
file.

June 12
-------

Move functionality from the extended version of simple_dependency_graph over to
the normal version and delete the extended version.

Add the new Closure vertex type.

Create extended_dependency_graph which is supposed to not contain single binary
package vertices but handle a package and its installation set as one vertex.

The output of extended_dependency_graph is optionally reduced to the biggest
(non degenerate) strongly connected component.

User gets the option of choosing the exploration depth.

June 13
-------

Pietro [replies][8] to my mail from June 8 but apparently I failed to express
myself well enough in my last mail, so I [rephrase my question][9].

Pietro [replies][10] to my email from June 11 and explains how the effect I see
is due to "a nuisance of the debian to cudf encoding". As a result I change my
code accordingly.

June 14
-------

Another lengthy (1130 words) [email][11] to the list. I explain what was done
in the past days, what parts work and how they work. I list some rationales on
why I did things the way I did them.

The most important observation is, that after improving my code again and
again, I ended up representing the dependency cycle problem in the same (very
similar) way that Pietro suggested in the beginning. This is probably a good
discovery.

Lots of data of that email is now of only little use as of June 16, I make lots
of improvements in correctness.

As I dont have an answer to my other email to Pietro from June 13, I implement
a very crude way to get an answer to the question of what packages are missing
for a package to be available/compileable. I called it
flatten_vpkgformula_best_effort and it suffers from many faults including
disjunctions and package conflicts.

Patrick spots a problem. As a result, I make sure that at no point, the source
package of an arch:all package can be listed.

June 15
-------

As a reply to my mail from June 13, Pietro creates a new branch in the git and
adds the code I needed to get a proper installation set.

June 16
-------

As a result of Pietro's efforts from June 15, I make [great advancements][12]
on all fronts.

Details of the current status follow in the next section.

Results
=======

A big leap was made on June 16 due to Pietro's great help on making me
understand how Depsolver.listcheck can be used for my purposes. My difficulties
in finding the solution myself are rooted in many parts of the dose framework
being poorly commented but Pietro did already a couple of documentation commits
whenever things were unclear for me.

Using Depsolver.listcheck makes it possible to be more distribution agnostic
and I dont have to handle vpkgs, virtual packages and constraints myself
anymore. The code also doesnt suffer anymore by wrongly analyzed dependencies
and conflicts. The only thing that is not yet taken care of, is that
Depsolver.listcheck only chooses one out of several possible installation set.
A final version should be able to take into account that a different
installation set could provide a better solution.

Overall, in comparison to two weeks ago, I can now properly build, traverse and
analyze graphs, can choose an installation set properly, understand more about
dependencies, closures, dose and ocaml in general.

Finding the importance of binary packages for building
------------------------------------------------------

When calculating how many source packages are depending on the
availability of a binary package I originally flattened the
pkg.Cudf.depends list twice for a rough overview. This is of course
wrong due to disjunctions and conflicts and also doesnt provide a deep
dependency inspection. The new method is to calculate an installation
set that is necessary to compile a source package for every source
package. The resulting list of binary packages is then used to find out
how often a binary package appears in an installation set.

I see three drawbacks though:

- calculating an installation set for each source package in the
  archive is very slow
- if X packages build depend on A then also X packages will build
  depend on the installation set of A, resulting in lots of duplication
- only one installation set is selected though there are many

Removing simple graph
Removing simple graph
---------------------

The simple graph which contained single binary and source packages was
removed. I realized it doesnt really serve any purpose to look at it. As
a result, Bin vertices and InstallDep edges are also not part of the
graph anymore. Since it was responsible for generating the list of
source packages that have to be cross built to make a package available,
I created a new function get_cross_source_packages which uses an
installation to serve the same purpose.

Fix extended_dependency_graph
-----------------------------

extended_dependency_graph now uses installation sets for generating the
list of packages that is needed to compile a source package or install a
binary package. The list of build dependencies does not include packages
that are already installable. The list of runtime dependencies does not
include packages that are otherwise available (cross built,
arch:all...). Instead of checking for list membership all the time, I
created hash tables for the list of installable as well as for the list
of available binary packages.

Future
======

There are two big tasks for the next two weeks:

Task one is to find a way to give hints on which packages to best
consider for having reduced build dependencies. This would then probably
finally make use of Pietro's cycle algorithms.

Task two is to find a way to break cycles and create a build-DAG from a
list of packages that already have staged build dependency information.

Patrick is currently working on patching dpkg with Build-Depends-StageN
dependencies as making perl cross compilable.  If he doesnt need the ability to
decide which packages to modify to have staged build dependencies in the near
future, then task one is probably less urgent and therefor of less importance
right now?

On the other hand, I can easily generate fake reduced build dependencies so
that doing task two right now would not be a problem. Also, having the solution
for task two will make it possible to show the user what effect it would have
to add reduced build dependencies to a package.

For the reasons above (it's not urgent, task one profits from task two being
solved) I will go and implement task two first (if there are no objections from
my mentors).

Another idea, that I discussed with wookey and Patrick yesterday, was that due
to multiarch being used for more and more packages, there should exist a set of
packages that is cross compilable without any change to the package.

We agreed that I make a list of packages that, if cross compiled, would break
dependency cycles and make other packages available. I created such a list of
about 160 packages for Ubuntu Natty that, if cross compiled, made it possible
to have 87% of Natty available (those numbers have to be treated with caution
as I did not yet use the proper way of installation sets when generating that
list, but the order of magnitude should be correct). Wookey can then try to
cross compile those packages.  If some packages of those "crucial" source
packages are found to be cross compilable, then they should be cross compiled
because it means that no work has to be done to break some cycles.  Cross
compiling all packages that are cross compilable out of the box is no solution,
as only natively compiled packages can go into the archive.  This is why the
list of potentially additionally cross compiled source packages has to be kept
as small as possible.

[1]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000146.html
[2]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000147.html
[3]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000150.html
[4]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000155.html
[5]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000159.html
[6]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000160.html
[7]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000165.html
[8]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000169.html
[9]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000172.html
[10]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000167.html
[11]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000173.html
[12]: http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-June/000191.html




More information about the Soc-coordination mailing list