[fdo] UTF-16 support ?

Scott James Remnant scott at netsplit.com
Tue Jun 15 10:44:37 PDT 2004


On Tue, 2004-06-15 at 13:37 -0400, Behdad Esfahbod wrote:

> On Tue, 15 Jun 2004, Scott James Remnant wrote:
> 
> > UTF-8 can encode all of the UCS-4 code points, however there is
> > significant overhead in the later planes turning a 4-byte UCS-4 sequence
> > into 6 or 7 byte character sequences.
> 
> This is not true.  UTF-8 encodes all valid Unicode characters in
> at most 4 octets.
> 
No, it is true, but what you said is true also.

"all of the UCS-4 code points" is a greater set than "all valid Unicode
characters".

Stolen from utf-8(7)@

       0x00010000 - 0x001FFFFF:
           11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

       0x00200000 - 0x03FFFFFF:
           111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

       0x04000000 - 0x7FFFFFFF:
           1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Scott
-- 
Have you ever, ever felt like this?
Had strange things happen?  Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://freedesktop.org/pipermail/freedesktop/attachments/20040615/51591831/attachment.pgp


More information about the freedesktop mailing list