[Bug 85307] escape_as_identifier issues

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Oct 22 05:17:13 PDT 2014


https://bugs.freedesktop.org/show_bug.cgi?id=85307

--- Comment #1 from Simon McVittie <simon.mcvittie at collabora.co.uk> ---
(In reply to Andy Grover from comment #0)
> 1) Docstring should clarify that this is for "object path components" and
> "service name components". It could be read as "object paths" (i.e. an
> entire path) and "service name components".

Something that is a valid service name component cannot possibly be a valid
object path, because object paths must start with "/", and service name
components must not contain "/".

> 3) Recommend specific functions escaping the two different things. The two
> string types have different allowable characters: bus name additionally
> allows '-', and object path component can start with [0-9], whereas bus name
> cannot. This could lead to characters being escaped when they needn't be.

A few places in Telepathy want to use strings that 1:1 correspond in an object
path and in a bus name (e.g. Connection and Client both do this). Using
different escaping algorithms would be troublesome for this.

I don't think it's worth introducing additional functions, and potentially
confusing API users into using the wrong one, for a minor gain in the number of
characters that can remain unescaped.

tp_escape_as_identifier() does what its name says: it outputs a valid C
identifier, which corresponds to various other languages' idea of what an
identifier is, and is also a strict subset of what is allowed in D-Bus object
path components, bus name components, interface name components, member
(signal/method) names and so on. We occasionally use it for mechanical
generation of parts of C and Python function names, too. It is potentially "too
escaped" (i.e. non-optimal for certain situations), but it is never "not
escaped enough".

Given any constrained input with particular characteristics / character
frequency / whatever, and any reasonable set of output restrictions (e.g.
object path component), I expect it to be possible to construct an algorithm
better than tp_escape_as_identifier(). For instance, we "escape" Telepathy 0.x
protocol names, which look like "local-xmpp", by replacing "-" with "_"[1] and
knowing that protocol names are sufficiently constrained that that's reversible
and the output will be a valid identifier.

That's not tp_escape_as_identifier()'s purpose; its purpose is to be "efficient
enough", particularly for the common case where the input is mostly
alphanumeric, while also being fully general so it can't break (except by
excessive length).

Telepathy deliberately doesn't currently have an inverse of
tp_escape_as_identifier(), because we use it in places where uniqueness and
debuggability are the only desired characteristics: a human reader reading logs
and knowing how tp_escape_as_identifier() works can decode the username from a
Connection's object path, but applications are never meant to do so.

systemd does not have the same philosophy for its analogous functionality,
which *is* reversible (and puts fewer constraints on the output - it's only
intended for object paths and filenames). That's fine, it's their function, not
ours, and they can document it however they want.

[1] Telepathy 1.0 changes the definition of protocol names so they look like
"local_xmpp", so that we can use them as-is

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the telepathy-bugs mailing list