[fdo] UTF-16 support ?

Behdad Esfahbod behdad at cs.toronto.edu
Tue Jun 15 10:47:08 PDT 2004


On Tue, 15 Jun 2004, Scott James Remnant wrote:

> On Tue, 2004-06-15 at 13:37 -0400, Behdad Esfahbod wrote:
>
> > On Tue, 15 Jun 2004, Scott James Remnant wrote:
> >
> > > UTF-8 can encode all of the UCS-4 code points, however there is
> > > significant overhead in the later planes turning a 4-byte UCS-4 sequence
> > > into 6 or 7 byte character sequences.
> >
> > This is not true.  UTF-8 encodes all valid Unicode characters in
> > at most 4 octets.
> >
> No, it is true, but what you said is true also.
>
> "all of the UCS-4 code points" is a greater set than "all valid Unicode
> characters".


Sure, we are both right.  By "that's not true" I meant
"significant overhead of UTF-8" is not true, for Unicode usage of
course :-).

behdad


> Stolen from utf-8(7)@
>
>        0x00010000 - 0x001FFFFF:
>            11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
>
>        0x00200000 - 0x03FFFFFF:
>            111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
>
>        0x04000000 - 0x7FFFFFFF:
>            1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
>
> Scott
>

--behdad
  behdad.org



More information about the freedesktop mailing list