Unicode validation range

Fri Feb 19 09:35:14 PST 2010

On Sat, Feb 6, 2010 at 5:28 PM, Thiago Macieira <thiago at kde.org> wrote:

> I'm trying to understand why we reject the FDD0-FDEF range.

I can't find offhand much information about this range - it looks like
it's in Arabic Presentation Forms-A block?

> This has been
> causing problems in some applications leading to even remote-crashable (app
> receives UTF-8 string from network, app sends such string via D-Bus, D-Bus
> disconnects unexepectedly, crash).

Can you give a little more background on the concrete case?  Is this
where e.g. Qt is being more liberal in accepting into UTF16 than dbus
is?  What's the application in question?

> I'm proposing we either:
>
> 1) remove the unnecessary checks and allow those characters in
>
> or
> 2) update the list, to include FFFE, 1FFFE, 1FFFF, 2FFFE, 2FFFF, etc.

Another major change we could make here is to have libdbus (or
possibly just the bus) synthesize an error on "invalid" UTF-8 rather
than disconnecting.  This would be nontrivial, but one that I think
developers at least would like.