[Teammetrics-discuss] No Message-ID found

Andreas Tille andreas at an3as.eu
Thu Aug 18 12:33:34 UTC 2011


Hi Sukhbir,

git pull

I handled the valid case of duplicated Message-IDs (explanation for
validity in commit log and code comment).  This again proves my point
that primary keys are a good idea - we should not count one single
message twice, right? :-)

However, the problems primary keys are uncovering always force you to
handle the according events.  Currently you are handling missing Message
IDs like:

            if msg_id_raw is None:
                logging.error('No Message-ID found')
                msg_id = ''

This works only once, because the primary key on Message_id throws

Traceback (most recent call last):
  File "./liststat.py", line 525, in <module>
    main(conf_info, total_lists)
  File "./liststat.py", line 453, in main
    parse_and_save(mbox_files)
  File "./liststat.py", line 262, in parse_and_save
    (msg_id, project, name, email_addr, subject, reason)
psycopg2.IntegrityError: FEHLER:  doppelter Schlüsselwert verletzt Unique-Constraint »pk_spam_messageid«
DETAIL:  Schlüssel »(message_id)=()« existiert bereits.

(Sorry for German locale - it just says "duplicated key »(message_id)=()« exists and
violates  Unique-Constraint »pk_spam_messageid«)

So what to do?  The fact that this is the second case where a missing ID
is qualified as SPAM message lets me assume that a missing message ID
could be perfectly added to the "reason"s for SPAM.  Moreover we need to
"invent" some valid Message-ID which is unique and enables us to keep a
record of this message.  So what about the following algorithm

   md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'

This should work as unique identifyer and would make sure we will not
violate the primary key constraint.

Kind regards

        Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list