[christin.livine at urbanisme.gov.pf: [fdo] UTF-16 support ?]

Mike Hearn mike at navi.cx
Tue Jun 15 15:42:07 EEST 2004


On Tue, 15 Jun 2004 13:20:40 +1000, Daniel Stone wrote:
> Hi,
> 
> First, I'm not a specialist of UTF.
> 
> I've read (http://www.unicode.org/notes/tn12/) that Windows 2000..., Mac OS
> X, QT/KDE fully support UFT-16.

Well, that article seems a bit vague about what "support" in this context
means. I can't speak for MacOS X because I don't know much about text
services there, but Windows uses UTF-16 because when its Unicode support
was designed UTF-8 did not exist. Unicode support in the Windows API at
any rate is an appalling hack which massively increases code size and
complexity. The rest of Microsofts software uses UTF-16 because Windows
does (also the Windows API doesn't provide much support for
UTF-8 as far as I know).

As for Qt: http://handhelds.org/~zecke/apidocs/qt/unicode.html 
 
There are a few mistakes in this page but it's rather old, so that's
forgivable. For starters Unicode isn't a 16 bit character set, as there
are multiple possible encodings. UTF-16 is only one. But, it sounds like
when Unicode was first added to Qt UTF-8 perhaps either did not exist or
wasn't widespread. I suspect the decision to use UTF-16 was also
encouraged by the fact that it would make (a) coding the Windows Qt
version easier and (b) for Windows users it would be easier to mix Qt and
Win32 APIs.

> I've just read (http://bugzilla.mozilla.org/show_bug.cgi?id=42893) that
> Mozilla 1.7 will include an option for UTF-16 webpages.

Actually, that's the composer element of Mozilla. I don't know what
encoding Mozilla uses internally. And, you will note from the comments,
the only reason UTF-16 support is being added to the composer is for
compatibility with Windows apps like Notepad which sometimes generate
UTF-16 files.

> But I don't know about situations of Linux, Gnome or other applications.

As far as I'm aware (and I may be talking smack here) the situation with
Linux is currently in flux, some distros like Red Hat/Fedora have gone
completely UTF-8 by default, others are still using the old codepage
system.

The future direction for system encoding is UTF-8. Individual apps like
OpenOffice or Qt based apps may be using UTF-16 internally, but for things
like saving files/filenames to disk, transfer via the clipboard, email etc
etc UTF-8 is the standard. Outside of application internals you should
only ever see UTF-8 in the future.

> Why not start pushing adoption of UTF-16 ?

Because UTF-16 is a pain in the ass, a holdover from the days when UTF-8
was not invented yet. It didn't even work fine before the Unicode
character set spilled over the 16bit limit as you had the whole BOM
fiasco, and when more than ~65k code points were introduced you had the
pairing thing which made correctly decoding UTF-16 arguably just as
complicated as for UTF-8 - of course the presence of bonded pairs (??) is
so rare that there are still some programs out there which treat it as an
edge case and get it wrong.

The future of Linux is UTF-8, though apps can use whatever representation
they like internally. Qt has support for UTF-8 in the API and GTK2 based
apps work entirely with UTF-8. It's really the only thing that makes
switching the entire OS to Unicode even feasable.

thanks -mike





More information about the xdg mailing list