[Teammetrics-discuss] Next phase: Handling spam

Sukhbir Singh sukhbir.in at gmail.com
Thu Jun 9 20:06:12 UTC 2011


Hi!

The query:

    INSERT INTO listarchive (project, yearmonth, author, subject, url,
ts) VALUES (?, ?, ?, ?, ?, '$today')

is of main importance to us. So let's work on this.

* project - the name of the mailing list.
* yearmonth - Ok.
* author -

We are going to insert names here, right? So by parsing 'From' of a
mbox archive, we we will get this (an example):

    tille at debian dot org (Andreas Tille)

For the guest account problem you mentioned:

    tille-guest at debian dot org (Andreas Tille)

So I was thinking we do a split on '-' and then push the names? So if
the above two address are there in the mbox, they are treated the same
for the user Andreas. Is this approach the one you talked about?

* subject - Do we need to save this in the DB? If yes, why?
* URL - Ok.
* TS - Ok.

So the author issue needs to be sorted out. And I remember you
mentioning something about multiple IDs so that is why I brought this
up as this is important.



More information about the Teammetrics-discuss mailing list