[fdo] UTF-16 support ?
Behdad Esfahbod
behdad at cs.toronto.edu
Tue Jun 15 10:47:08 PDT 2004
On Tue, 15 Jun 2004, Scott James Remnant wrote:
> On Tue, 2004-06-15 at 13:37 -0400, Behdad Esfahbod wrote:
>
> > On Tue, 15 Jun 2004, Scott James Remnant wrote:
> >
> > > UTF-8 can encode all of the UCS-4 code points, however there is
> > > significant overhead in the later planes turning a 4-byte UCS-4 sequence
> > > into 6 or 7 byte character sequences.
> >
> > This is not true. UTF-8 encodes all valid Unicode characters in
> > at most 4 octets.
> >
> No, it is true, but what you said is true also.
>
> "all of the UCS-4 code points" is a greater set than "all valid Unicode
> characters".
Sure, we are both right. By "that's not true" I meant
"significant overhead of UTF-8" is not true, for Unicode usage of
course :-).
behdad
> Stolen from utf-8(7)@
>
> 0x00010000 - 0x001FFFFF:
> 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
>
> 0x00200000 - 0x03FFFFFF:
> 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
>
> 0x04000000 - 0x7FFFFFFF:
> 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
>
> Scott
>
--behdad
behdad.org
More information about the freedesktop
mailing list