[sane-standard] sane standard proposals (5) "character encoding"

Johannes Berg johannes@sipsolutions.net
Mon, 11 Oct 2004 02:41:34 +0200


--=-nyiUuECzH0zGMCZY+Tau
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

5. character encoding

Currently, the TODO list notes as undecided:

UTF-8 format
All texts and translations: UTF-8 format (this is used in KDE and gtk
+-2.x) may be UTF-8 should be forced as SANE_Char format. Currently
ISO-8859-1 is used as encoding.


while the standard specifies:
Type SANE_String represents a text string as a sequence of C char
values. The end of the sequence is indicated by a '\0' (NUL) character.

The latter is inconsistent with the definition (it should reas 'as a
sequence of SANE_Char values', I think); but using UCS2 or UCS4 would
have the disadvantage that using a single "\0" as a terminator is no
longer possible since \0 may occur in a valid UCS2 stream (and will,
when you write ASCII).

Also this bloats the transferred strings unnecessarily since all texts
would be ASCII anyway (since they're English, and translation is only
done in the frontend).
But combining this with my proposal (4) [to put translation into the
network protocol] would still mean that translated strings need to be
transferred, thus it is not possible to stay with ASCII or ISO-8895-1.

Therefore, I propose that UTF-8 be used as otherwise there are two
options:
  1) use UCS4 which is a waste of bandwidth
  2) put character set handling into the protocol

I think both of these options are not desirable, the former hurts the
common case too much (inflating a 10-character strings from 10 to 40
bytes), and the second adds too much complexity. But in an application
that wants to support all possible character sets UCS2 does not
suffice.=20

johannes

--=-nyiUuECzH0zGMCZY+Tau
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Comment: Johannes Berg (SIP Solutions)

iQIVAwUAQWnWuqVg1VMiehFYAQKTkA//bDCa5yersC8uxGMLuV3e4ArQmaTR+P4r
W1DOhxNmKEYtXl4YLScLfKcOvPhnj+l+ylMR3nAdteRnVqNZHd/VJzEUs/zhpXO7
Z5nUtYRclRwt3Co0EQq/EGwU7kwL7gTM5ES9w4C+BioG59dp/LzHrSdoDsrCa4iW
8pPybR9ja+Zvc4P9YlZqGBz5phfsr4aVNddNoZ8s5zORbkwzFgUDLlmoW8wU6Q05
q9YxiGCklG/MJjGD/Em3dH3NgFmB59SM670K7KYILBQuYXht/jL1RMbaiOHs2a4P
D0l3NojJ70J+BzN4Gcfo1M8MoTxgqV0oT/H5P+zBtLefZpl4oqoYtULtVwShc1Wx
CSPchPgMCNu2O4n05Iz1zAXflOTGG3jFL8tbLBiKTIxPZFTi9cjDYnKohvzrAszh
nsijQf0Rs9h/3ZVPSMKjhrLEbjQX/SrfBb1nUNoVWWLE/oAntQRrdOKaDDKanz65
VXT+zLPuc0MmlhQqiIADNiJtq/+IL0/mJQqCUmGd0I79mnFG6qnkX+1ZfTUTMTGW
eId9l2actw/WUPcRY8Kba/GHyLY86AWKcKiuulq5CZZZCfwgJ4pMOouy12JMYYx+
t6+GKIGgW4/9TuygvQKQIsYD9RS7f4R7AEOUZNSmpMER0R58//1i5tNVO78abDcQ
8yx9s4SJsSM=
=IQPb
-----END PGP SIGNATURE-----

--=-nyiUuECzH0zGMCZY+Tau--