[Teammetrics-discuss] Phase I: Updates

Andreas Tille andreas at an3as.eu
Fri Jun 3 06:42:51 UTC 2011


Hi Sukhbir,

sorry for my slow response rate.  Currently I'm on vacation at the
place of my son and am browsing my mail not regularly.  This state
will be until next Tuesday.

On Thu, Jun 02, 2011 at 10:56:47AM +0530, Sukhbir Singh wrote:
> > So I would rather move the mboxes to /var/cache/teammetrics/liststats
> > and the configuration file should be rather reside in  /etc/teammetrics
> 
> Done. I have selected /var/cache/teammetrics and /etc/teammetrics.

Fine.
 
> > I would prefer a better readable config file.
> 
> Done. I have implemented support for ConfigParser, which is compliant
> to RFC 822 and the Debian standard.
> 
> So now you can create the config file in the format of:
> 
> [list-name]
> url = <base-url>
> lists = <either a single list> or <list 1>
>     <list 2>
>     <list n>
> 
> Note that you are not restricted to a single section (list-name). You
> can have as many sections as you want, I have implemented support for
> handling multiple sections and multiple lists.

Sounds good.
 
> You have to manually create the file for now but I plan to fix that
> soon after we get other stuff sorted. PS: Suggestions for a config
> file name are welcome.

For me the name listinfo is fine - perhaps adding a .conf might not
harm, but I do not mind much about names.
 
> I have handled the exception but an important thing comes up.
> 
> Let's take the example of our own mailing list, teammetrics-discuss
> [1]. If you notice, the archive for the month of June (which is the
> current month) is in plain text and not a gzip archive. Of course I
> can download the archive for the active month also but do we really
> want to? Shouldn't we download it when the month completes? So my
> question is -- do we measure performance for the active month the
> script is run? Note that we will be implementing a system where the
> same list is not parsed again, so we can probably parse the active
> month later.

Currently I do not evaluate the current month but only "complete"
monthes.  So if you ignore this month you are perfectly reproducing what
I'm doing as well.  So this is fine for me.  For lists living for some
times the final result will not be influenced heavily by the latest
month - so I do not think you need to do much effort about this in the
beginning.
 
> > Logfile would be probably a good idea.
> 
> Not done. I have plans to finish this by this weekend at most, given
> everything else is acceptable to you :-) (I just have to replace the
> print statements)

OK.
 
> A few other questions:
> 
> - when the user runs the script for the first time, the directory and
> the config files may not be there. How do you want to handle this?
> Should I create the directory and a sample config file or ... ?
> - as is expected, the script runs with root privileges. I have put in
> a check for that (see `is_root()` in `liststat.py`). Is this
> acceptable?

I noticed this when trying.  I have not updated my mind about this
completely.  If I would build a package for a potential teammetrics
package I would add a postinst script which creates a group teammetrics,
creates thesw two directories and enable write permissions for the
members of the group teammetrics.  While building a package for your
code might on one hand a bit overdesigned because it will not really
be installed on a lot of machines it might be helpful for such things
anyway.  (I personally create private packages for my own use in such
cases as well.)  This might be an idea - but creating this group and
the permissions manually might be possible as well.
 
> Your testing actually helped us make this a lot better! So please feel
> free to test it as rigorously as you like.

I did not detected another bug.  However, I have seen some results which
are a sign for pure SPAM which we need to handle sooner or later.  With
the config file

[list-name]
url = http://lists.alioth.debian.org/pipermail/
lists = cdd-commits
        blends-commit
        debian-med-commit
        teammetrics-discuss


I get results like:

=?UTF-8?B?TXBzIFJ1c3lhICYgVMO8cmtpIEN1bWh1cml5ZXRsZXIgSMSxemzEsSBQYWtldCBUYcWfxLFtYQ==?= - 1
accounts.note - 1
amandajohnson10004 - 2

or things like this.  It would be great if we would find some way to
kick such stuff.  I had some means implemented in my script to sort out
those SPAMers.  This is probably not the first thing to do but I would
like to let you keep this in mind.
 
> I think that should cover up everything for now and I hope it
> addresses all the concerns you had.

Thanks for your work on this and please excuse my slower responsiveness
for the next couple of days. 

Kind regards

    Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list