Bug#496266: UTF-8 string characters not properly recognized

Adam Majer adamm at zombino.com
Tue Sep 2 18:19:11 UTC 2008


Christian Perrier wrote:
>> Le samedi 23 août 2008 à 19:59 -0500, Adam Majer a écrit :
>>> Package: gedit
>>> Version: 2.22.3-1
>>> Severity: normal
>>>
>>> The following UTF-8 string is not correctly handled in gedit,
>>>
>>> const char *unicode_insert = "?Э";
>>>
>>> The " and the ? characters are viewed as one character, making the
>>> entire thing next to impossible to copy/paste/edit.
>> Looks like an issue in pango, since it is not specific to gedit.
>>
>> Such things seem to happen a lot when using Tibetan characters, so this
>> may or may not be intentional. I’d prefer to have the input of someone
>> who uses them. Is there anyone on debian-i18n who’s more knowledgeable
>> about Tibetan glyphs?
> 
> 
> Adding Pema Geyleg and Tenzin Dendup, our fellow Dzongkha translation
> coordinators, who certainly have skills about Tibetan-family scripts
> (Dzongkha is one of these) and could maybe point you to people with
> needed knowledge.


I'm sorry, but aren't we missing the entire point here? This is not
about bad handling of some Tibetan characters. It is about bad handling
of 3-byte UTF-8 characters.

http://en.wikipedia.org/wiki/UTF-8

So, the following characters should have the same problems,

"ऄक

"ঈউঊ

"ਜਗਏ

"ଜଁଂ

"ஔ

"ంఁః

"ಂಖ

"ഈഃ

etc..


I've put a Ascii " in front of all the different characters. In emacs, 
I'm able to select the " in front of these characters and copy it. vim 
under a UTF-8 gnome terminal also allows the " to be selected. The 2nd 
last line above (using icedove), I can't independently select the " but 
I can select the " and ಂ together and then remove the 2nd character.

Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at 
least my expected behaviour was being able to select 1 UTF-8 character 
at a time, even if linguistically it does not make any sense.

- Adam







More information about the pkg-gnome-maintainers mailing list