[Teammetrics-discuss] [andreas at an3as.eu: No Message-ID found]

Andreas Tille andreas at an3as.eu
Thu Aug 18 13:33:42 UTC 2011


You can probably respond via list - I than can reply - which seems to
work.  Sometimes this mailing list is strange somehow - did not
experienced this in other lists ...

On Thu, Aug 18, 2011 at 06:47:35PM +0530, Sukhbir Singh wrote:
> > Traceback (most recent call last):
> >  File "./liststat.py", line 525, in <module>
> >    main(conf_info, total_lists)
> >  File "./liststat.py", line 453, in main
> >    parse_and_save(mbox_files)
> >  File "./liststat.py", line 262, in parse_and_save
> >    (msg_id, project, name, email_addr, subject, reason)
> > psycopg2.IntegrityError: FEHLER:  doppelter Schlüsselwert verletzt Unique-Constraint »pk_spam_messageid«
> > DETAIL:  Schlüssel »(message_id)=()« existiert bereits.
> 
> Oh! I never anticipated that we will come across a duplicate message ID.

Well, for the *validly* duplicated Message-IDs I hoped to provide a
sensible explanation.  For the SPAM-caused missing Message-IDs it is
clear that you get duplicates if you set them to ''.
 
> > record of this message.  So what about the following algorithm
> >
> >   md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'
> 
> Sounds good :) So we do this for messages that have no message ID set,
> right?

Yes, If there is no Message-ID found, then set it to 

     md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'

*and*

     set SPAM reason flag to "No Message ID".

Could you implement this right now to enable me keeping on with my
tests?

> And it can help in better detection of spam also later.

However, it is a bit hard to remove the SPAM messages according to
the Message-ID. :-)

I have no idea if similar things will happen in lists.debian.org mboxes
- but I'd also vote to drop these messages - at least I will suggest this
to listmaster in my next ping...

Kind regards

     Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list