DBus API problems & UTF-8

Mon Jun 12 06:15:34 PDT 2006

Kimmo Hämäläinen wrote:
>> The solution would probably be to have a DBusError that is attached to
>> that message and is set by any failing functions. This way, there
>> would be no API changes, but it would allow you to obtain the error
>> condition.
>
>You mean that the caller could check if the message is valid (no errors
>happened) before sending it?

No.

I mean that current functions continue to return FALSE in case of errors, 
or -1 or whatever they do right now. If you need to understand the reason 
they failed, you inspect this member.

For object types that aren't connected to a message or DBusConnection 
(etc), they should return the reason for failure. I'm thinking of 
DBusString or similar simple types.

>> Because we don't want to hear about encodings. If you do that, then
>> client and server have to negotiate an encoding before they can start
>> receiving strings from one another. There's also the potential that
>> one of the two doesn't have the necessary codec installed.
>
>I mean DBUS_TYPE_STRING. I think the specification should just say that
>it's a NUL-terminated sequence of bytes. 

No, the specification should say that it's a NUL-terminated sequence of 
bytes, carrying a Unicode string encoded in UTF-8.

The specification is correct.

>That way we don't have to care 
>about encodings, and we don't have to verify UTF-8 in the server (there
>is already enough unnecessary O(n) stuff happening in the code...).

Right, we don't have to verify, as long as all applications comply.

>My point is that the DBus specification does not seem to have any reason
>for specifying UTF-8 as the encoding

Yes it does: interoperability. Applications may be running in different 
locale environments, which means that their concept of "local encoding" 
may vary. Therefore, we take the superset of all encodings (Unicode) and 
say that all strings passed on the bus will be encoded in that.

>-- NUL-terminated byte (save zero 
>byte) array would allow for more efficient communication when some other
>encoding is used between applications, and the validity check for the
>string data would be left entirely to the applications (where it belongs
>-- DBus is just a message bus, it should not inspect the content).

You're describing ARRAY of BYTE.

>Java and Qt are different, because they need to process the string data.
>DBus is a message bus and should not care about the actual content as
>long as the message format is correct.

Huh? What do other bindings do with the string data? Discard it?

No, all bindings process it somehow, into some kind of string object. All 
languages have it, even C in its purest form (const char *).

>Yes, byte array is an alternative, but isn't it just a workaround for
>bad design. Applications would benefit from a generic string type,
>because otherwise they need code for distinguishing byte array (binary
>data) from a string (NUL-terminated, of some encoding). DBus is
>supposedly meant to serve applications, not the other way around.

And what's the difference between a sequence of bytes that carry a string 
and a sequence of bytes that carry binary data?

>Dropping UTF-8 would be probably simple -- just removing the validation
>from the server (that seems to be the only place interested of the
>content) and updating the specification. However, it would still be
>dangerous change at this point because some applications could be
>counting on it. (I'm just bringing this up for some future version of
>the specification.)

I'm completely against that.
-- 
Thiago José Macieira - thiago.macieira AT trolltech.com
Trolltech ASA - Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/dbus/attachments/20060612/ceb8f8e2/attachment.pgp