Unicode validation range
Thiago Macieira
thiago at kde.org
Fri Feb 19 14:38:54 PST 2010
Em Sexta-feira 19. Fevereiro 2010, às 22.36.52, Colin Walters escreveu:
> On Fri, Feb 19, 2010 at 6:39 PM, Thiago Macieira <thiago at kde.org> wrote:
> > Em Sexta-feira 19. Fevereiro 2010, às 18.35.14, Colin Walters escreveu:
> >> On Sat, Feb 6, 2010 at 5:28 PM, Thiago Macieira <thiago at kde.org> wrote:
> >> > I'm trying to understand why we reject the FDD0-FDEF range.
> >>
> >> I can't find offhand much information about this range - it looks like
> >> it's in Arabic Presentation Forms-A block?
> >
> > This range is a "non-character" range. Unicode reserves those 32
> > codepoints, plus the last two codepoints in each page as non-character.
> > It means they will never be assigned to anything.
>
> Thanks, I've added this to a patch for the code below which merges in
> the fix from glib.
>
> > I'm willing to agree that QString conversion to/from UTF-8 should have
> > caught those (it already blocks UTF-16 surrogate codepoints when encoded
> > in UTF-8, as well as U+FFFE and U+FFFF).
>
> Ok, can we add this as an action item?
>
> > However, one can also argue that those two processes communicating over
> > D-Bus constitute "one application" and should be allowed to use those
> > codepoints.
>
> Well, yes, except that it's easily possible for others to be watching
> even method calls (think dbus-monitor or more complex tools). Part of
> the value of DBus is that it performs validation of the communication,
> so it's robust, which is particularly crucial for the system bus. And
> is for the session to in the presence of SELinux etc.
>
> People sometimes have wanted special features when talking just
> "inside" an app or service, which I understand. Often if you want
> this an easy solution is to simply step outside the DBus type system
> entirely; if it's convenient for you to send say a JSON blob, you can
> do so.
>
> Now...I think fundamentally what we need to preserve here is that
> major consumers of DBus agree on the validation. It's not immediately
> obvious to me that g_utf8_validate and _dbus_string_validate_utf8
> agree for instance. However the last change to g_utf8_validate looks
> like it was in 2004
> (https://bugzilla.gnome.org/show_bug.cgi?id=159131) which would seem
> to make it more likely they're in sync; the last change looks like it
> was unrolling some cases.
>
> So with the attached patch and the change to QtString, would that
> close this issue for now?
Agreed. I will fix QString too.
Question to other bindings: how does your language handle the non-character
Unicode codepoints?
You may want to verify that your UTF-8 encoding routines do not produce the
codes for those non-characters.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Senior Product Manager - Nokia, Qt Development Frameworks
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/dbus/attachments/20100219/2be8e5e4/attachment.pgp
More information about the dbus
mailing list