[Python-modules-team] Bug#611923: python-xdg: wrong behaviour breaking apps on non-supported locale

Yann Dirson ydirson at free.fr
Sat Feb 5 11:53:12 UTC 2011


On Fri, Feb 04, 2011 at 11:42:51AM +0100, Jakub Wilk wrote:
> Disclaimer: I'm not maintainer of this package.
> 
> * Yann Dirson <ydirson at free.fr>, 2011-02-03, 20:56:
> >In [1]: import xdg.DesktopEntry
> >
> >In [3]: e=xdg.DesktopEntry.DesktopEntry()
> >
> >In [4]: e.parse('plugins/Games/Chess.desktop')
> >
> >In [5]: e.getName()
> >Out[5]: u'\xc9checs'
> >
> >In [6]: print "%s" % e.getName()
> >------> print("%s" % e.getName())
> >Échecs
> >
> >
> >Now, if I use LC_ALL=fr_FR or fr_FR.ISO-8859-1 (which should be
> >equivalent), the final step instead throws:
> >
> >UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 0: ordinal not in range(128)
> 
> I assume that, as the subject suggest, it fails only if there is no
> fr_FR.ISO-8859-1 locale available. Am I correct? (If this is the
> case, perhaps something like fr_FR.ISO-8859-42 would be a better
> test-case, as it's less like to exist.)

Right.

> >By contrast, other programs in the same condition fallback to C
> >locale, which results in no error.  I guess the xdg module should do
> >something similar.
> 
> It is true that xdg uses a bit different language lookup algorithm
> that GNU gettext does. I can see it is flawed in a few ways and I
> can see your point. However, I don't think your particular use case
> is of much significance, for the following reasons:
> 
> 1. If you are using non-existent locales, you shoot yourself in the
> foot. :)

Yes, but users may inadvertently request a locale with implicit
encoding without realizing it is not the encoding they want (which is
exactly what happenned to me, and I had not realized at first what the
problem was)


> 2. getName() returned a Unicode string for in French, which was kind
> of what you asked for.

Well, what I was asking for primarily was a string that would match
the locale, so it would fit into the GUI - and in that respect I got
something different than I was expecting.

> 3. If you want your application to be robust, you should not print
> Unicode strings blindly. Encoding of sys.stdout can be ASCII even if
> proper UTF-8 locale is set:
> 
> $ locale charmap
> UTF-8
> 
> $ python -c 'print u"\xde"'
> Þ
> 
> $ python -c 'print u"\xde"' | cat
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xde' in position 0: ordinal not in range(128)

Right, the problem of spitting debugging output using "print" is a
different one, and I had not realized that python was diregarding the
locale when the output is redirected.  I'll have to dig into this, do
you have any pointer ?

Best regards,
-- 
Yann





More information about the Python-modules-team mailing list