[Teammetrics-discuss] Phase I: The final parts.

Andreas Tille andreas at an3as.eu
Tue Jun 7 16:57:31 UTC 2011


On Tue, Jun 07, 2011 at 08:27:30PM +0530, Sukhbir Singh wrote:
> 
> + The archives for the current month will no longer be downloaded,
> irrespective of whether they are plain text or gzip archives. When the
> script runs again in the next month, the pending archive will
> automatically be downloaded.

OK.
 
> + The mbox archives that are downloaded are hashed and their checksum
> is calculated using SHA-1. This is then saved in a file. I will
> describe this in points so that it is easy for you to suggest:

OK.
 
> 1. The hashes are stored in a file called
> '/etc/teammetrics/lists.hash' along with the 'listinfo.conf' file. The
> location can be easily changed, so suggest as you want.

I'd rather put this to /var/cache/teammetrics.
Finally this is no configuration file for admin edit purpose but data
which can easily be recreated by running the program again.  Well, at
least if you fix the problem when it is not yet there ;-):

$ sudo python liststat.py 
File not found lists.hash
 
> 2. I am storing the hashes in a CSV format using : as the delimiter. I
> picked this format since this file will store a large number of values
> and doesn't need to be edited by the user. So this serves both
> purposes. If you think otherwise, let me know.

That's perfectly fine.  It is not a file which is intended for human
editing / reading - so anything is OK.
 
> 3. As expected, the mbox will be downloaded first, the hash
> calculated, the hash is checked (if any) and then the hash is saved.
> If the hash matches the one in the file, the mbox is not parsed. If it
> does not match, it will be parsed. This takes care of two things for
> us: redundancy -- the same month for a particular list won't be parsed
> again and integrity -- if the mbox changes, it will be downloaded
> again.

This is the way I wanted it to be.
 
> 4. All this is working and you can try it out.

Modulo the missing file in 2. :-)
 
> What is left:
> 
> - Logging. Yes I know :( But with almost everything else done, I won't
> avoid it anymore!

OK

> - Spam filter.

I would officially call it Spam "handling".  We actually do not invent
yet another SPAM filter.  We just try to avoid some cruft in our DB.

> That's it for now. Code cleaning as usual is scheduled for this weekend. Heh!

Kind regards

      Andreas. 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list