[Teammetrics-discuss] A basic (and broken!) mbox filter.

Andreas Tille andreas at an3as.eu
Thu Aug 11 12:34:09 UTC 2011


On Thu, Aug 11, 2011 at 04:14:00PM +0530, Sukhbir Singh wrote:
> Try the second version of the mbox filter (mboxfilter.py)

Try the third version of mbox filter or `git pull`.
As I said I would like some flexibility in adding / removing certain
fields which is enabled now with the code I pushed.
 
> Works the same way as the previous script but the mbox created has a
> file name with '_converted' appended to it as you requested.

Probably you forgot to push this change - it keeps on changing the mbox
which is just read.  If I try to solve tasks like this I usually do the
following:

  input  = argv[1]
  output = input + '_converted'
  ifp    = open(input, 'r')
  ofp    = open(output, 'w')

  for message in all_messages
     read single message
     write single message

This saves you handling potentially large lists in memory for no use at
all and makes the algorith less complex (only one main loop instead of
two).  I did not changed this - but you might consider this for general
coding style.

> This uses nntpstat.py to create the mbox. Notice line 50. This is the
> only problem with this approach. Some messages have a weird formatting
> that I can't handle.

I do not really know what you mean with this.  Could you provide the
example mbox you are using for your tests?
 
> Let me know after trying this your thoughts about this approach so I
> can try removing the only problem this has...

I wonder why you have choosen to reformat From into

   <username> at <doma.in> (Real Name)

Any reason not to stick to the original format?  I would by all means
avoid changing the format given in the original mbox.

Kind regards

        Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list