[Teammetrics-discuss] Updates and related to mbox creation.

Sukhbir Singh sukhbir.in at gmail.com
Tue Dec 13 14:31:16 UTC 2011


Hi Andreas,

1. The test run for web archive parser has ended on blends.d.n. It ran
for 55 teams and based on a casual glance of some popular teams I
'knew', it looks good. You can check in your own way!

2. Mbox creation:

Because we are fetching the message from the web archive, we don't
know which encoding it was originally sent in. While creating a mbox,
we need to specify the encoding for non-ASCII characters in the From,
Subject and the Body fields. As we don't know the encoding, here is
what we can do:

i. Ignore the convention of mbox and save everything in utf-8
*without* specifying the encoding. So we just save the file using
utf-8. (This breaks the convention of a mbox/ email headers as defined
in the RFCs).
ii. Assume utf-8 for all messages and specify the encoding as that
only. (safest?)

If I am missing something, please let me know. IMHO, there is no other
way I know of other than ii and it doesn't make a difference if we
store everything in utf-8.

Sample: http://lists.debian.org/debian-edu/2011/06/msg00007.html (Holger!)
Message ID: 201106030209.08178.holger at layer-acht.org in 'debian-edu.201106'

-- 
Sukhbir



More information about the Teammetrics-discuss mailing list