Bug#257709: gnome-terminal: failing to paste accented characters

Sjoerd Simons sjoerd@spring.luon.net (Sjoerd Simons), 257709@bugs.debian.org
Mon, 18 Apr 2005 11:50:40 +0200


On Sun, Apr 17, 2005 at 10:27:33PM -0400, J. Bruce Fields wrote:
> On Mon, Apr 18, 2005 at 12:39:57AM +0100, Stewart Jeacocke wrote:
> > On Sun, 2005-04-17 at 19:26 -0400, J. Bruce Fields wrote:
> > > > The problem is that you are not using a UTF-8 (Unicode) system lo=
cale.
> > > > Run
> > > >
> > > > # pkg-reconfigure locales
> > > >=20
> > > > and select a Unicode locale (eg en_GB.UTF-8) as the default syste=
m
> > > > locale. Log out of GNOME and log back in.
> > >=20
> > > Yeah, OK, thanks, that seems to explain the symptoms.  As a practic=
al
> > > problem, it seems that most of the email and newgroups I see are us=
ing
> > > iso-8859-1, so that's the only thing that seems to work as a defaul=
t
> > > encoding for my terminal.

At least e-mail (and probably newsgroups too) indicate which encoding the=
re
using in the headers. So a mail reader should convert from the mails loca=
le to
the terminals locale if possible (Which mutt does fine with for example t=
his
mail when in an utf-8 terminal).. That's not a problem of the terminal.

> > I'm pretty sure that iso-8859-1 encoding is a subset of the Unicode
> > encoding. So even when the locale is set to a Unicode encoding
> > iso-8859-1 (extended ASCI) documents should still work fine (they see=
m
> > to here).
> >=20
> > If they really don't then would you attach an example file that conta=
ins
> > characters that fail to render with a Unicode locale?
>=20
> None of these:
>=20
> e with an acute accent: "=E9"
> e with a grace accent: "=E8"
> c with a cedille: "=E7"
>=20
> show up if I choose UTF-8 in gnome-terminal.
>=20
> iso-8859-1 may be a subset of unicode in the sense that all the
> characters it encodes are also in unicode, but I don't believe that the
> iso-8859-1 encoding is a subset of UTF-8.  I'm far from an expert on
> this, though....

ASCII is a subset of UTF-8, not iso-8859-1. Characters like e with an acu=
te
accent have the 8th bit set in iso-8859-1, which for UTF-8 means that it'=
s one
of multiple bytes encoding one character.

  Sjoerd
--=20
"Protozoa are small, and bacteria are small, but viruses are smaller
than the both put together."