[fdo] UTF-16 support ?
Scott James Remnant
scott at netsplit.com
Tue Jun 15 10:33:30 PDT 2004
On Mon, 2004-06-14 at 12:34 -1000, Christin LIVINE wrote:
> I've read (http://www.unicode.org/notes/tn12/) that Windows 2000..., Mac OS
> X, QT/KDE fully support UFT-16.
>
> I've just read (http://bugzilla.mozilla.org/show_bug.cgi?id=42893) that
> Mozilla 1.7 will include an option for UTF-16 webpages.
>
> But I don't know about situations of Linux, Gnome or other applications.
>
> Why not start pushing adoption of UTF-16 ?
>
UTF-8 was deliberately designed to be backwards-compatible with ASCII
("hello" becomes "hello") and therefore the C programming language
("\0", the C string terminator, can only ever represent the NULL byte).
UTF-16 isn't; it's a two-byte encoding sequence therefore simple ASCII
cannot be represented by encoding ("hello" becomes "\0h\0e\0l\0l\0o")
and it is no longer compatible with C ("\0" can appear in a byte stream
so basic functions like strlen() cease to work).
UTF-8 can encode all of the UCS-4 code points, however there is
significant overhead in the later planes turning a 4-byte UCS-4 sequence
into 6 or 7 byte character sequences.
However it makes the ideal interchange format as older software just see
gibberish in the top 127 characters which provided they are binary clean
and just reproduce will still work.
It also makes the ideal UNIX character set because we can continue to
use ASCII and C without needing new functions (again, provided we don't
need to use the non-ASCII characters and don't mangle them).
UTF-16 is less abusive in memory and processing usage, but it means you
lose your backwards compatibility. It's a better internal
representation format but you then need to duplicate your functions like
strlen() to take into account the character set.
UNIX wisdom is to use UTF-8 for filenames, UI, network traffic and all
external interfaces then to pick something appropriate *internally* for
your character sets.
Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://freedesktop.org/pipermail/freedesktop/attachments/20040615/5bc36c24/attachment.pgp
More information about the freedesktop
mailing list