[Teammetrics-discuss] Error in liststat.py

Andreas Tille andreas at an3as.eu
Fri Aug 5 21:24:33 UTC 2011


On Fri, Aug 05, 2011 at 09:41:37PM +0530, Sukhbir Singh wrote:
> Hi,
> 
> So I just ran liststat.py with the lists in the repository
> /etc/teammetrics/listinfo.conf and I have not been able to reproduce
> the error.

Did you run it on blends.debian.net?  There are chances for different
behaviour which might be:

   - blends.d.n is running stable - you mighth run something else
   - it might need certain locale to parse a string which might
     by chance work on your system

I've got again:

$ ./liststat.py
Traceback (most recent call last):
  File "./liststat.py", line 502, in <module>
    main(conf_info, total_lists)
  File "./liststat.py", line 454, in main
    parse_and_save(mbox_files, mbox_hashes)
  File "./liststat.py", line 254, in parse_and_save
    decoded_subject = email.header.decode_header(raw_subject)
  File "/usr/lib/python2.6/email/header.py", line 93, in decode_header
    dec = email.quoprimime.header_decode(encoded)
  File "/usr/lib/python2.6/email/quoprimime.py", line 336, in header_decode
    return re.sub(r'=\w{2}', _unquote_match, s)
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib/python2.6/email/quoprimime.py", line 324, in _unquote_match
    return unquote(s)
  File "/usr/lib/python2.6/email/quoprimime.py", line 106, in unquote
    return chr(int(s[1:3], 16))
ValueError: invalid literal for int() with base 16: 'we'


The logfile now says:

2011-08-05 20:49:29,303 INFO: Parsed: pkg-samba-maint-2006-March                            
2011-08-05 20:49:29,303 INFO: Parsing: pkg-java-maintainers-2008-June                       
2011-08-05 20:49:29,863 WARNING: Spam detected: Name is in upper case                       
2011-08-05 20:49:30,007 WARNING: 'utf8' codec can't decode byte 0xe6 in position 23: invalid continuation byte - 伴娘小禮服180元                                                        
2011-08-05 20:49:30,086 WARNING: Spam detected: Subject is in upper case                    
2011-08-05 20:49:30,309 WARNING: Spam detected: Subject is in upper case                    
2011-08-05 20:49:30,332 WARNING: Spam detected: Subject is in upper case                    
2011-08-05 20:49:30,339 WARNING: Spam detected: Subject is in upper case                    
2011-08-05 20:49:30,449 WARNING: Spam detected: Name is in upper case                       
2011-08-05 20:49:30,474 INFO: Parsed: pkg-java-maintainers-2008-June                        
2011-08-05 20:49:30,474 INFO: Parsing: pkg-java-maintainers-2011-May                        
2011-08-05 20:49:30,569 WARNING: Spam detected: Subject is an empty field                   
2011-08-05 20:49:30,969 ERROR: No Message-ID found                                          
2011-08-05 20:49:31,038 WARNING: Spam detected: Ignored keyword in Name: loan               
2011-08-05 20:49:31,084 ERROR: No Message-ID found                                          
2011-08-05 20:49:31,262 ERROR: No Message-ID found                                          
2011-08-05 20:49:32,002 WARNING: Spam detected: Subject is in upper case                    
2011-08-05 20:49:32,015 WARNING: Spam detected: Subject is in upper case                 

> I checked the listinfo.conf in blends.debian.net with the
> one in the repository and they are the same.
> 
> My count for the total message reveals:
> 
>  count
> -------
>  80276
> (1 row)

Nothing is imported at all on blends.d.n. :-(

> Even though I have added a check mechanism in the form of a log:
> before parsing a mailing list it prints the name so that it can
> (possibly?) help us diagnose the problem.

You might want to look at /var/log/teammetrics/liststat.log.0.

I was impertinent enough to become you and did the following:

   umask 0002   (in .bashrc)

to make sure that files touched by you stay group writable.  I GRANTed
you all rights in teammetrics database so you should be able to commit.

Then I restarted liststat.py as YOU.

When doing so I realised that *all* mboxes from alioth are fetched from
scratch as *.gz.  The import process just leaves the uncompressed
mboxes.  I wonder whether there is a chance to "gzip.open('file.gz')" as
described in[1] to transparently work on gzip files.

I'm also starting to wonder if we always need to download mboxes from
past years.  Well, there could be some removals from SPAM removal
efforts - however, I guess it will just create a lot of traffic with no
visible advantage for us.  So you might consider just keeping those
mboxes (compressed) which are "older than 6 month".  The import routine
probably does not need a change because the MD5sum simply remains
unchanged for those files - we just can save the download.

> So I suggest you probably try running it again perhaps? Because I am
> sure that the last time (two days back) when I did a test run of
> liststat.py, there was no error then also. So maybe the error you got
> must have been temporary or something else. I am just guessing!

I'm afraid this guessing is wrong.  When starting as sukhbir there is
exactly the same changelog (compare liststat.log and liststat.log.0).
Also the output is identical.  Just do your test on blends.debian.net.

Kind regards

      Andreas.

[1] http://docs.python.org/library/gzip.html 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list