[Teammetrics-discuss] Phase I: The final parts.

Sukhbir Singh sukhbir.in at gmail.com
Sun Jun 5 19:54:27 UTC 2011


Hi,

This week, and hopefully within a period of three - four days at most,
I will be implementing the final features of the list parser.

1. Storing which lists have been downloaded and thus preventing lists
from being downloaded again. Suggestions are welcome on this. Is using
a conf file that maps list-names to SHA1 checksums OK?
2. Logging support. I know this has been delayed somewhat, but I am
not proceeding without this.
3. For fetching the list of the current month to be parsed:

Earlier I thought that for the current month, Mailman stores the list
in plain text instead of gzip archive. That's not correct. What in
fact happens is if the list has a very small size (perhaps 1 KB?), it
stores it in plain text then as the size grows, it puts them into a
gzip archive. And so, we do not parse mailing lists of the current
month at all. Then probably when the script runs the next time or the
next month, the list of the last month can be parsed. This is better
instead of parsing an incomplete month, as discussed with Andreas
before.

4. Spam filter. I have not looked carefully into this, just a glance
at Andreas' code but I will be implementing it as the final step.
5. And then lastly, pushing the information into a database.

That's the action plan for this week and I hope I can finish stuff
quickly. Suggestions as always are welcome.

--
Thanks,
Sukhbir.



More information about the Teammetrics-discuss mailing list