[Teammetrics-discuss] Phase I: The final parts.

Sukhbir Singh sukhbir.in at gmail.com
Sun Jun 5 21:18:26 UTC 2011


Hi,

> I'm afraif I do not understand the question correctly.  The *names* of
> the mailing lists are in the config file and I do not see a need to keep
> the names as a hash sum (in addition).  IMHO we only need to store hash
> sums of the mbox files to not parse them again.

Sorry, I used to the wrong terminology.

What I meant was that suppose _X_ mailing list is parsed. We generate
a checksum for the mbox archives downloaded from _X_ and store their
hashes in a file. So for example for _foo-bar_ mailing list with _foo_
and _bar_ mbox archives, we store the hashes of _foo_ and _bar_. We
can't save the hash of the _X_ itself as a whole because we are not
parsing the current month in a list. If we store the hash of _X_
itself, we miss the current month.

Another related question on this topic, once when we are done with the
parsing, should we remove both the archives and the mbox files? As of
now, I am removing the archives only.

> I'd suggest to simply check whether a download file is gziped and either
> have them all gzipped before processing or unzip them all to have a
> unique handling of the parsing routine.

In the next code release you will get to see how I handle this. I have
planned it out.

> Make sure you have a look into
>
>   svn://svn.debian.org/svn/blends/blends/trunk/team_analysis_tools/archives.sql

I intend to for very sure!



More information about the Teammetrics-discuss mailing list