[Soc-coordination] Debian Teams Activity Metrics - Report I

Sukhbir Singh sukhbir.in at gmail.com
Sat Jun 4 12:06:40 UTC 2011


This is my first report for the Google Summer of Code project 'Debian
Teams Activity Metrics' with Andreas Tille and Scott Howard as the

We are working on developing tools that will help measure the
performance of teams in Debian (and Debian Pure Blends). There are
various metrics we have decided upon that will help us to gauge
performance that can be found on the Debian Wiki [0]. In this report,
I will focus only on the first phase of the project (out of a total of

The first phase is to implement a mailing list archive parser that
will return the frequency of the most active contributor to a mailing
list (measured using the 'From' header). During the application period
for GSoC, it was suggested that we use MailListStat [1] instead of
reinventing the wheel [2]. Initially, that seemed a very viable
solution as opposed to doing this from scratch. However, when actual
development began, we noticed that it was inadequate for our purpose:

- it could only parse local mbox archives thus not allowing us to
automate the process.
- it is written in C, which possibly might speed up the parsing but it
doesn't blend in with our requirements and the primary language for
the project, Python.
- if we generate the statistics for a mailing list using [1], we run
into all sorts of problems trying to make it match with what we want.

With my insistence for having a customized solution that we could
maintain and the green signal from Andreas, I decided to reinvent the
wheel and rewrite a mailing list archive parser in Python that
completely automates what we set out out to do. And this is what we
have been working on for a week (the first commit of the project was a
week ago). First, we setup our project on Alioth [3] and our
repository for the project is at [4]. After some pondering over the
design and how we are going to tackle this problem, we started working
on it and in a period of nine days, we have a mailing list parser that

- fetches the mailing lists from a given URL,
- downloads the mbox archives,
- parses them and gathers the data that *we require*.
- ... is highly customizable!

This is a pure Python solution that runs on Python 2.6+ with the
stdlib library modules + one external library, BeautifulSoup, which is
luckily packaged with Debian already. So all you need to do is to
create a config file that specifies which list(s) (multiple lists are
supported) you want to download and it will do everything on its own.

To try it out with the soc-coordination mailing list, clone the repository:


And then run liststat.py . Follow the instructions on the screen. A
sample config file for your testing can perhaps be:

    url = http://lists.alioth.debian.org/pipermail/
    lists = soc-coordination

(Put this in /etc/teammetrics/listinfo.conf)

That's all you need to specify. The script will automatically fetch
the archive from the URL, download the mbox archives and parse them.

As of now, this completes about 75% of Phase I. What is left:

- pushing this information into the database (very easy),
- implementing a mechanism to remove spam from the lists (hard),
- preventing redundancy by not allowing lists already downloaded to be
download/ parsed again (easy),
- replace the print statements with the logging module (easy).

The best part about this is that it is not centric to Debian only.
This script can be used for generating the statistics of any mailing
list that runs on GNU Mailman. MailListStat is good at what it does,
but for a completely automated approach, our solutions works

The discussions for this project are on the public mailing list [5].
Suggestions about the project and otherwise are welcome; you can get
in touch with us on the mailing list or here.

A special thanks to Andreas and Scott for being patient and helping
with this project. It's an absolute pleasure working with you guys.

Please get in touch if you have any questions and thanks for reading,

Sukhbir Singh

[0] - http://wiki.debian.org/SummerOfCode2011/TeamFeatures/SukhbirS
[1] - http://www.marki-online.net/MLS/
[2] - http://socghop.appspot.com/gsoc/proposal/review/google/gsoc2011/ssingh/1
       (comments section, viewable only to mentors and admins)
[3] - https://alioth.debian.org/projects/teammetrics/
[4] - https://alioth.debian.org/scm/browser.php?group_id=100628
[5] - http://lists.alioth.debian.org/pipermail/teammetrics-discuss/

More information about the Soc-coordination mailing list