[Teammetrics-discuss] Phase I: The final parts.

Sukhbir Singh sukhbir.in at gmail.com
Tue Jun 7 14:57:30 UTC 2011


Hi,

The work finished sooner than I expected, so here I am!

Please update your repository before we proceed.

Changes:

+ The archives for the current month will no longer be downloaded,
irrespective of whether they are plain text or gzip archives. When the
script runs again in the next month, the pending archive will
automatically be downloaded.

+ The mbox archives that are downloaded are hashed and their checksum
is calculated using SHA-1. This is then saved in a file. I will
describe this in points so that it is easy for you to suggest:

1. The hashes are stored in a file called
'/etc/teammetrics/lists.hash' along with the 'listinfo.conf' file. The
location can be easily changed, so suggest as you want.

2. I am storing the hashes in a CSV format using : as the delimiter. I
picked this format since this file will store a large number of values
and doesn't need to be edited by the user. So this serves both
purposes. If you think otherwise, let me know.

3. As expected, the mbox will be downloaded first, the hash
calculated, the hash is checked (if any) and then the hash is saved.
If the hash matches the one in the file, the mbox is not parsed. If it
does not match, it will be parsed. This takes care of two things for
us: redundancy -- the same month for a particular list won't be parsed
again and integrity -- if the mbox changes, it will be downloaded
again.

4. All this is working and you can try it out.

What is left:

- Logging. Yes I know :( But with almost everything else done, I won't
avoid it anymore!
- Spam filter.

That's it for now. Code cleaning as usual is scheduled for this weekend. Heh!

--
Thanks,
Sukhbir.



More information about the Teammetrics-discuss mailing list