[Teammetrics-discuss] How does filling up the database work?

Sukhbir Singh sukhbir.in at gmail.com
Tue Aug 9 13:56:45 UTC 2011


> Hmmm, this algorithm does not really need a MD5sum - just the name of
> the parsed mbox would be sufficient, right?  You are just not
> downloading a mbox which is in lists.hash.  I assumed that you would
> download all mboxes in *any* case and only parse it when the md5sum

This is the finest example of procrastination :D I had to change the
code not to calculate the SHA-1 and just save the list name. This will
be done.

> BTW, steps 3. and 4. should be exchanged.  If 4. might fail for some
> reason you should not set the "not for download" flag in lists.hash.

lists.hash saves the entire mbox, not individual message so the only
time this can fail is when the entire mbox is corrupted. I don't think
this can happen at all, so... Plus there is no way to handle this
later and the complexity is not worth it.

> So we should remember to run the script on 2nd of every month to be
> safe.

Should I add a check for this? if (script_run_date) == 1st day of Month, quit.

> Question: If (for whatever reason) I would re-read all mboxes (for
> instance after getting the information about massive SPAM removal) I
> need to delete the corresponding entries in lists.hash, right?
> According to your algorithm it also requires to clean up the database

Yes, that's right. We have to delete everything in /var/cache/teammetrics

> from the entries of this project.  To make sure that this will not be
> forgotten we should set a primary key (project,message_id) to prevent
> adding a message twice.

... or we can write a shell script that does it easily for us without
having to bother with the primary key :)



More information about the Teammetrics-discuss mailing list