Object path restrictions

Simon McVittie simon.mcvittie at collabora.co.uk
Fri Jun 17 08:34:52 PDT 2011


On Fri, 17 Jun 2011 at 18:33:58 +0400, Pavel Strashkin wrote:
> The D-Bus specification says the following "Each element must only
> contain the ASCII characters "[A-Z][a-z][0-9]_" about object path. Is
> there any real reason why you don't want to see ".", "-", " ", ";" and
> other (except "/") characters in an element?

Because the specification says so, and it's about 5 years too late for
non-interoperable changes. I have no idea what the original reasoning was,
but at this point it doesn't really matter; we're not going to break all
implementations and bindings.

One argument in favour of a restricted character set is that you
can use object paths as filesystem paths on any filesystem, including
things like Windows, assuming you replace "/" with the OS's path delimiter.

Restricting to an ASCII subset also means you avoid having to consider what
the encoding of non-ASCII characters would be. (UTF-8 is used for D-Bus
strings, but there's no guarantee that (for instance) Unix filenames,
environment variables, etc. are UTF-8 -  a single Unix machine can have a
mixture of UTF-8, Latin-1 and whatever else. See the GLib documentation for
extensive discussion of why you can't assume a consistent filesystem
character set on Unix.)

> I think D-Bus should go in the same way as URLs: keep "/" as elements
> delimiter, allow other characters within and use %XX for quoting.

telepathy-glib has tp_escape_as_identifier() which uses _XX for URL-style
quoting, producing strings which are valid in all D-Bus contexts (bus name
component, object path component, interface component, member) and also in
C/C++ identifiers. I suggest using a similar algorithm to encode arbitrary
strings into object paths; if a component gets too long you might have to
resort to hashing the initial string and writing it in hex, though.

    S


More information about the dbus mailing list