Bug#366992: [debiandoc-sgml-pkgs] Bug#366992: debiandoc-sgml: [INTL:uk] Ukrainian language support

Eugeniy Meshcheryakov eugen at univ.kiev.ua
Wed May 17 22:37:06 UTC 2006


17 травня 2006 о 23:54 +0200 Jens Seidel написав(-ла):
> > > The only problem I could imagine is that SGML will not or wrongly complain about
> > > invalid characters. I have to check this.
> > > 
> > > > -DESCSET  128 32 UNUSED
> > > > +DESCSET  128 32 32
> > > > @@ -23,10 +23,7 @@
> > > >  SHUNCHAR CONTROLS   0   1   2   3   4   5   6   7   8   9
> > > >                     10  11  12  13  14  15  16  17  18  19
> > > >                     20  21  22  23  24  25  26  27  28  29
> > > > -                   30  31                     127 128 129
> > > > -                  130 131 132 133 134 135 136 137 138 139
> > > > -                  140 141 142 143 144 145 146 147 148 149
> > > > -                  150 151 152 153 154 155 156 157 158 159
> > > > +                   30  31                     127 
> > > 
> > > A stupid question from my side, but could you please explain this?
> > > That's Ardo's code and I'm not familiar with it.
> > This part of patch fixes problem that sgml processor complains about bad
> > characters in UTF-8 text (at least written in Ukrainian).
> 
> Yep.
> 
> > Second part tells sgml processor to not ignore characters in range
> > 128-159.
> > 
> > So effect of those two parts is - sgml processor handles characters with
> > codes 128-159 as usuall (allowed) characters.
> 
> OK. But 0-31 and 127 are still rejected, right?
> I assume these numbers to not refer to UTF-8 characters but to single
> bytes. This makes UTF-8 characters consisting of two bytes with a second
> byte of this range invalid!? Can you confirm this?
> 
Characters with codes 0x0..0x7f (0..127) are the same as in ASCII, they
cannot be found in sequences that correspond to other characters. So if
they are not currently needed, they are not needed for UTF-8 support too.

> On the other side these characters are currrently not supported at all.
> 
> Any reason not to remove 0-31 and 127 as well (except that it would be
> accepted in the first byte as well which is bad)?

-- 
Eugeniy Meshcheryakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debiandoc-sgml-pkgs/attachments/20060518/cbc37c4d/attachment.pgp


More information about the Debiandoc-sgml-pkgs mailing list