[patch] NUL in strings (was: Marshalling bytewise data with Glib bindings)
thiago at kde.org
Mon May 21 04:11:56 PDT 2007
Simon McVittie said:
> On Mon, 21 May 2007 at 07:01:18 +0200, Thiago Macieira wrote:
>> STRING should be used only for properly-encoded UTF-8 data. NULs are
>> allowed, but arbitrary binary data isn't.
> My understanding is that NULs are not allowed in STRING (and that it's
> terminated with a NUL, but that's not semantically part of the data). IMO
> the specification could be interpreted either way, but libdbus doesn't
> appear to support NUL-safe strings:
> * dbus_message_iter_append_basic() stops appending at NUL
> * dbus_message_iter_get_basic() doesn't indicate the length
Hmm... given the API, I'd tend to agree. But the API and the protocol
itself are two different things.
The U+0000 codepoint is a valid entry in a Unicode string, but it serves
no semantic purpose other than a general-purpose separator in
machine-readable contexts (since Unicode assigns it no specific purpose,
unlike other Unicode separators).
If used as a separator, I'd argue that the data should be transmitted in
the form of an ARRAY of STRING rather than NUL-separated STRING. It would
be a "do it right / portably" vs "do it efficiently" discussion.
The purpose I see for allowing NUL in strings is to support transmission
of Unicode data where it is allowed. I can't think of any case out of
hand, but I wouldn't rule out their existence. Though it's also possible
to use ARRAY of BYTE (for UTF-8) or an ARRAY of UINT16 (UTF-16) to
transmit the same data.
> Havoc/other designers, did you intend the STRING type to be NUL-safe? If
> so, libdbus needs extra API and the spec needs clarifying to allow NULs;
> if not (as I suspect), the spec needs clarifying to forbid NULs.
> A proposed patch for the latter case follows.
Anyways, with interoperability in mind, I'd say NUL shouldn't be allowed.
High-level bindings with dedicated string objects may support it[*], but
I'd say low-level programs such as those using libdbus-1 directly may be
hard pressed to deal with those strings.
[*] DBusString does, but it's also more like a general-purpose byte array
than a string object. Besides, it's internal.
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
More information about the dbus