[fdo] UTF-16 support ?
Scott James Remnant
scott at netsplit.com
Tue Jun 15 10:44:37 PDT 2004
On Tue, 2004-06-15 at 13:37 -0400, Behdad Esfahbod wrote:
> On Tue, 15 Jun 2004, Scott James Remnant wrote:
>
> > UTF-8 can encode all of the UCS-4 code points, however there is
> > significant overhead in the later planes turning a 4-byte UCS-4 sequence
> > into 6 or 7 byte character sequences.
>
> This is not true. UTF-8 encodes all valid Unicode characters in
> at most 4 octets.
>
No, it is true, but what you said is true also.
"all of the UCS-4 code points" is a greater set than "all valid Unicode
characters".
Stolen from utf-8(7)@
0x00010000 - 0x001FFFFF:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0x00200000 - 0x03FFFFFF:
111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0x04000000 - 0x7FFFFFFF:
1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://freedesktop.org/pipermail/freedesktop/attachments/20040615/51591831/attachment.pgp
More information about the freedesktop
mailing list