DBus API problems & UTF-8

Mon Jun 12 05:47:24 PDT 2006

On Mon, Jun 12, 2006 at 02:36:45PM +0300, Kimmo H?m?l?inen wrote:
> On Mon, 2006-06-12 at 13:09, ext Ross Burton wrote:
> > On Mon, 2006-06-12 at 12:40 +0300, Kimmo Hämäläinen wrote:
> > > I mean DBUS_TYPE_STRING. I think the specification should just say
> > > that
> > > it's a NUL-terminated sequence of bytes. That way we don't have to
> > > care
> > > about encodings, and we don't have to verify UTF-8 in the server
> > > (there
> > > is already enough unnecessary O(n) stuff happening in the code...).
> > > 
> > > My point is that the DBus specification does not seem to have any
> > > reason
> > > for specifying UTF-8 as the encoding -- NUL-terminated byte (save zero
> > > byte) array would allow for more efficient communication when some
> > > other
> > > encoding is used between applications, and the validity check for the
> > > string data would be left entirely to the applications (where it
> > > belongs
> > > -- DBus is just a message bus, it should not inspect the content). 
> > 
> > So if my GTK+ application wants to communicate with a Qt application, I
> > need to know that the Qt applications are talking in UTF-16 and convert
> > my strings?
> > 
> > If a Qt application wants to send a message to the message bus itself it
> > has to change the strings to UTF-8 (assuming the bus itself remains
> > speaking UTF-8).
> 
> You can still have contract between the applications to use UTF-8
> encoding in that case. I'm not sure if DBus should police that. If we
> want policing, we could have a new type DBUS_TYPE_STRING_UTF8.

DBus has never aimed to be the absolute fastest IPC protocol / implementation.
There have been concious tradeoffs between performance, and security, reliability
& interoperability. For some applications requiring UTF-8 may be a little bit
slower than plain C strings, but the benefits of interoperability this brigns
to applications & their developers outweighs any performance hit by an order of
magnitude. 

If one were to allow arbitrary byte-strings, you'd just be pushing the problem
of character encoding out of the library & into *every* single application using
DBus. While this may give a tiny performanc benefit in some cases, the burden 
it imposes on application developers makes it completely non-viable as a solution. 

> > Making the validation optional would speed performance, but I don't
> > fancy having to introspect every client my application talks to just to
> > find out what encoding they are using, hoping I have access to a codec
> > for it, and then transforming all of my strings.
> 
> Yes, I'm sure Gtk application developers like it, but sometimes UTF-8
> does not make much sense. Sometimes they just want to pass C strings
> around conveniently.

If you're not already using UTF-8 in your C program, then its at most one 
single method call to convert. Unless your strings are absolutely enourmous
the performance hit of the conversion shouldn't show up as a large hotspot.
In any case this is a small price to pay for cross-application interoperability. 

Regards,
Dan.
-- 
|=-            GPG key: http://www.berrange.com/~dan/gpgkey.txt       -=|
|=-       Perl modules: http://search.cpan.org/~danberr/              -=|
|=-           Projects: http://freshmeat.net/~danielpb/               -=|
|=-   berrange at redhat.com  -  Daniel Berrange  -  dan at berrange.com    -=|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.freedesktop.org/archives/dbus/attachments/20060612/d844d497/attachment-0001.pgp