[Teammetrics-discuss] Spam filters and encoding handlers in place
andreas at an3as.eu
Thu Jun 23 06:22:58 UTC 2011
On Thu, Jun 23, 2011 at 01:58:55AM +0530, Sukhbir Singh wrote:
> + A working spam filter in place. This is handled by the spamfilter.py.
Not tested but code looks reasonable.
> + I have tried to handle all the encoding errors, but still (very few
> I guess) still remain. Weirdly, all the encoding errors as of now are
> with the Subject field *only* and not with Name field. I will find out
> what is causing the problem soon.
... as I told you encoding stuff is time consuming and leaves you alone
with strange riddles. :-(
> + There is a new table called listspam which saves the reason why the
> message was considered as spam which will help us identify how well
> our filter is working (as requested by Andreas).
I've seen it in the code - good.
> So, overall, pretty slick!
> SELECT name, COUNT(name) FROM listarchives WHERE
> project='debian-med-commit' GROUP BY name ORDER BY count DESC LIMIT
> name | count
> Charles Plessy | 1352
> Andreas Tille | 1261
> tille at alioth.debian.org | 755
> hanska-guest at alioth.debian.org | 509
> Mathieu Malaterre | 498
> plessy at alioth.debian.org | 389
> Steffen Möller | 346
> smoe-guest at alioth.debian.org | 344
> charles-guest at alioth.debian.org | 342
> olivier sallou | 169
> (10 rows)
> So let me know your thoughts on this.
tille at alioth.debian.org == Andreas Tille
hanska-guest at alioth.debian.org == David Paleino
plessy at alioth.debian.org && charles-guest at alioth.debian.org
== Charles Plessy
smoe-guest at alioth.debian.org == Steffen Möller
--> you can find these name replacements in my Perl code. As I said we
should either implement this translation in a config file (which might
become largish) or as I meanwhile prefer in a database table.
> The next phases in order:
> + deb package.
> + encoding errors.
Thanks for your continuous work
More information about the Teammetrics-discuss