[Teammetrics-discuss] Next phase: Handling spam

Fri Jun 10 07:04:24 UTC 2011

On Fri, Jun 10, 2011 at 02:24:40AM +0530, Sukhbir Singh wrote:
> Isn't there any way we can find out the email addresses from Alioth
> for multiple email addresses attached to the same name or vice versa?

I have not found a better way for the mailing lists and I have no idea
if there is some way to parse the list of subscribers of a list somehow.
We can assume that our most active people are subscribed to the list in
question (even if you can post to lists.debian.org if not subscribed at
all).  In addition it might be just another way to avoid SPAM because
spammer will not be subscribed if we just consider postings of
subscribed people.  But I have no idea how to handle this.

It becomes a bit better when we are talking about UDD data (about
uploaded packages).  The first advantage is that I wrote this in Python
;-)

I guess you have a checkoput of

   svn://svn.debian.org/svn/blends/blends/trunk/team_analysis_tools

The script maintain_names_prefered.py is creating a table

   carnivore_names_prefered

in UDD which makes sure we have identical names for uploaders.  If you
have a look into the UDD table carnivore_names of UDD (BTW, it for
easier investigations I'd recommend installing a copy of UDD on your
side and I should also grant you permission on the copy at
blends.debian.net.  Just send me your prefered login name and a ssh key
to enable you a login there).  In this table you will find up to 6 (in
words six) names for one and the same person.  And this table is trying
to do some magic with GPG key IDs (which we have for uploaders in
contrast to random mailing list posters).  My (currently not official)
UDD attempt tries to do some magic with a positive list of names, usage
statistics etc. to finally find a unique name string which is more or
less the prefered way of spelling.

I admit, all this is no fun.  It is stupid manual work.  Any more clever
approach is welcome.  As an exercise I'd suggest to create your local
UDD as described here

   http://wiki.debian.org/UltimateDebianDatabase/CreateLocalReplica

and then call maintain_names_prefered.py .  (I think you need to
   psql udd < create_names_prefered.sql
before and before you can run upload_history.py you need to
follow the advise in the comment at the end of this sql script.)

You see, the great coding is only one part of the project.  The thrill
is to make the manual work (and I think there will some of it be left) a
bit easier than merging lookup tables into the code (as I did in a quick
hack).

Kind regards

        Andreas.

-- 
http://fam-tille.de