[Teammetrics-discuss] Spam filters and encoding handlers in place

Andreas Tille andreas at an3as.eu
Thu Jun 23 06:22:58 UTC 2011


On Thu, Jun 23, 2011 at 01:58:55AM +0530, Sukhbir Singh wrote:
> Changes:
> 
> + A working spam filter in place. This is handled by the spamfilter.py.

Not tested but code looks reasonable.
 
> + I have tried to handle all the encoding errors, but still (very few
> I guess) still remain. Weirdly, all the encoding errors as of now are
> with the Subject field *only* and not with Name field. I will find out
> what is causing the problem soon.

... as I told you encoding stuff is time consuming and leaves you alone
with strange riddles. :-(
 
> + There is a new table called listspam which saves the reason why the
> message was considered as spam which will help us identify how well
> our filter is working (as requested by Andreas).

I've seen it in the code - good.
 
> So, overall, pretty slick!

Yup.
 
> SELECT name, COUNT(name) FROM listarchives WHERE
> project='debian-med-commit' GROUP BY name ORDER BY count DESC LIMIT
> 10;
>                 name                | count
> ------------------------------------+-------
>  Charles Plessy                     |  1352
>  Andreas Tille                      |  1261
>  tille at alioth.debian.org         |   755
>  hanska-guest at alioth.debian.org  |   509
>  Mathieu Malaterre                  |   498
>  plessy at alioth.debian.org        |   389
>  Steffen Möller                     |   346
>  smoe-guest at alioth.debian.org    |   344
>  charles-guest at alioth.debian.org |   342
>  olivier sallou                     |   169
> (10 rows)
> 
> So let me know your thoughts on this.

tille at alioth.debian.org == Andreas Tille
hanska-guest at alioth.debian.org == David Paleino
plessy at alioth.debian.org && charles-guest at alioth.debian.org
    == Charles Plessy
smoe-guest at alioth.debian.org == Steffen Möller


--> you can find these name replacements in my Perl code.  As I said we
should either implement this translation in a config file (which might
become largish) or as I meanwhile prefer in a database table.


> The next phases in order:
> 
> + deb package.
> + encoding errors.

Fine.
 
Thanks for your continuous work

      Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list