Unicode validation range

Colin Walters walters at verbum.org
Fri Feb 19 13:36:52 PST 2010


On Fri, Feb 19, 2010 at 6:39 PM, Thiago Macieira <thiago at kde.org> wrote:
> Em Sexta-feira 19. Fevereiro 2010, às 18.35.14, Colin Walters escreveu:
>> On Sat, Feb 6, 2010 at 5:28 PM, Thiago Macieira <thiago at kde.org> wrote:
>> > I'm trying to understand why we reject the FDD0-FDEF range.
>>
>> I can't find offhand much information about this range - it looks like
>> it's in Arabic Presentation Forms-A block?
>
> This range is a "non-character" range. Unicode reserves those 32 codepoints,
> plus the last two codepoints in each page as non-character. It means they will
> never be assigned to anything.

Thanks, I've added this to a patch for the code below which merges in
the fix from glib.

> I'm willing to agree that QString conversion to/from UTF-8 should have caught
> those (it already blocks UTF-16 surrogate codepoints when encoded in UTF-8, as
> well as U+FFFE and U+FFFF).

Ok, can we add this as an action item?

> However, one can also argue that those two processes communicating over D-Bus
> constitute "one application" and should be allowed to use those codepoints.

Well, yes, except that it's easily possible for others to be watching
even method calls (think dbus-monitor or more complex tools).  Part of
the value of DBus is that it performs validation of the communication,
so it's robust, which is particularly crucial for the system bus.  And
is for the session to in the presence of SELinux etc.

People sometimes have wanted special features when talking just
"inside" an app or service, which I understand.  Often if you want
this an easy solution is to simply step outside the DBus type system
entirely; if it's convenient for you to send say a JSON blob, you can
do so.

Now...I think fundamentally what we need to preserve here is that
major consumers of DBus agree on the validation.  It's not immediately
obvious to me that g_utf8_validate and _dbus_string_validate_utf8
agree for instance.  However the last change to g_utf8_validate looks
like it was in 2004
(https://bugzilla.gnome.org/show_bug.cgi?id=159131) which would seem
to make it more likely they're in sync; the last change looks like it
was unrolling some cases.

So with the attached patch and the change to QtString, would that
close this issue for now?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-dbus-string-Sync-up-UNICODE_VALID-with-glib-add-docu.patch
Type: text/x-patch
Size: 1733 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/dbus/attachments/20100219/a773f0b0/attachment.bin 


More information about the dbus mailing list