Unicode validation range

Thiago Macieira thiago at kde.org
Fri Feb 19 14:38:54 PST 2010


Em Sexta-feira 19. Fevereiro 2010, às 22.36.52, Colin Walters escreveu:
> On Fri, Feb 19, 2010 at 6:39 PM, Thiago Macieira <thiago at kde.org> wrote:
> > Em Sexta-feira 19. Fevereiro 2010, às 18.35.14, Colin Walters escreveu:
> >> On Sat, Feb 6, 2010 at 5:28 PM, Thiago Macieira <thiago at kde.org> wrote:
> >> > I'm trying to understand why we reject the FDD0-FDEF range.
> >> 
> >> I can't find offhand much information about this range - it looks like
> >> it's in Arabic Presentation Forms-A block?
> > 
> > This range is a "non-character" range. Unicode reserves those 32
> > codepoints, plus the last two codepoints in each page as non-character.
> > It means they will never be assigned to anything.
> 
> Thanks, I've added this to a patch for the code below which merges in
> the fix from glib.
> 
> > I'm willing to agree that QString conversion to/from UTF-8 should have
> > caught those (it already blocks UTF-16 surrogate codepoints when encoded
> > in UTF-8, as well as U+FFFE and U+FFFF).
> 
> Ok, can we add this as an action item?
> 
> > However, one can also argue that those two processes communicating over
> > D-Bus constitute "one application" and should be allowed to use those
> > codepoints.
> 
> Well, yes, except that it's easily possible for others to be watching
> even method calls (think dbus-monitor or more complex tools).  Part of
> the value of DBus is that it performs validation of the communication,
> so it's robust, which is particularly crucial for the system bus.  And
> is for the session to in the presence of SELinux etc.
> 
> People sometimes have wanted special features when talking just
> "inside" an app or service, which I understand.  Often if you want
> this an easy solution is to simply step outside the DBus type system
> entirely; if it's convenient for you to send say a JSON blob, you can
> do so.
> 
> Now...I think fundamentally what we need to preserve here is that
> major consumers of DBus agree on the validation.  It's not immediately
> obvious to me that g_utf8_validate and _dbus_string_validate_utf8
> agree for instance.  However the last change to g_utf8_validate looks
> like it was in 2004
> (https://bugzilla.gnome.org/show_bug.cgi?id=159131) which would seem
> to make it more likely they're in sync; the last change looks like it
> was unrolling some cases.
> 
> So with the attached patch and the change to QtString, would that
> close this issue for now?

Agreed. I will fix QString too.

Question to other bindings: how does your language handle the non-character 
Unicode codepoints?

You may want to verify that your UTF-8 encoding routines do not produce the 
codes for those non-characters.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Senior Product Manager - Nokia, Qt Development Frameworks
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/dbus/attachments/20100219/2be8e5e4/attachment.pgp 


More information about the dbus mailing list