[Teammetrics-discuss] Updates.

Andreas Tille andreas at an3as.eu
Thu Jun 30 20:51:08 UTC 2011


On Fri, Jul 01, 2011 at 12:38:14AM +0530, Sukhbir Singh wrote:
>      name      | frequency | rawlen | quotelen | blanklen | siglen
> ---------------+-----------+--------+----------+----------+--------
>  Sukhbir Singh |        77 |  58673 |      998 |     1248 |   1248
>  Andreas Tille |        46 |  66462 |      946 |     1590 |    854
>  Scott Howard  |         4 |   4318 |       48 |       91 |     91

Nice.
 
> As you can notice, 'siglen == blanklen' as Scott doesn't have a
> signature, it's just `~Scott` while Andreas and I do have one. That
> explains the difference in the `siglen` column and perhaps why it is
> important. I feel all the metrics are pretty conclusive for a mailing
> list. Rest you can observe. Here is a summary once again:
> 
>     rawlen -- total number of characters in the message body.
>     blanklen -- total number of lines in the body excluding blank lines

Nitpicking:  The name "blanklen" implies that we are counting the number
of blanks and not non-blanks.

>     quotelen - total number of lines excluding blank lines AND lines
> starting with >
>     siglen - total number of lines excluding blank lines AND lines
> starting with > AND up till '-- '

Same here:  The naming of the columns is suboptimal.

> For the lists.debian.org, I investigated using the NNTP interface.
> That works perfectly. We get exactly what we want and it's fast and
> doesn't strain the Gmane server (40,000 subjects/ From fields in ~10
> seconds).  There is only one drawback and that is the obfuscation of
> the mail addresses. And that was only in one list I checked. I didn't
> keep a check as to which it was (sorry) but out of six lists, only one
> had obfuscated email addresses.

IMHO we could live with just a few obfuscated lists  - at least for
the moment.
 
> So what I suggest now is that we go with NNTP access only. I think
> that obfuscation is a rarity and we should go ahead with this. For
> starters, you can point me to some mailing lists that you would want
> to parse first so I can check for obfuscation. Then at DebConf, we can
> take up how to parse these lists or request for mbox archives.

You might like to check my Perl code for all the lists I was observing.
 
> I will be investigating the CGI thing tomorrow.

Great.  BTW, I'll be offline tomorrow - well for *me* it is tomorrow -
for you it is "today". :-)

Kind regards

     Andreas. 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list