[PATCH] No longer need to escape '+' in a D-Bus address

Michael Witten mfwitten at MIT.EDU
Tue Sep 9 09:25:11 PDT 2008


Hello again. I'm back to keep this thread alive.

On 29 Aug 2008, at 8:17 AM, Havoc Pennington wrote:

> Hi,
>
> On Fri, Aug 29, 2008 at 3:14 AM, Michael Witten <mfwitten at mit.edu>  
> wrote:
>> The alternative is to loosen the specification (for at least file
>> hierarchy "addresses"), which seems acceptable to me.
>
> If you loosened it to never require escaping, or proved that "+" was
> the only possible escape-requiring char in this context, then it would
> be slightly easier to fix this OS X bug. But every time someone has a
> filename with a char that requires escaping, loosening the list of
> escape chars is not going to be the fix every time.

A one Peter O'Gorman on the darwin-userlevel()lists.apple.com mailing  
list
pointed me to:

	http://launchd.macosforge.org/trac/browser/trunk/launchd/src/launchd_core_logic.c

which shows where $TMPDIR is getting set:

	char tmpdirpath[PATH_MAX];
	
	...
	
	r = confstr(_CS_DARWIN_USER_TEMP_DIR, tmpdirpath, sizeof(tmpdirpath));
	
	if (likely(r > 0 && r < sizeof(tmpdirpath))) {
		setenv("TMPDIR", tmpdirpath, 0);
	}

The real guts of the code is in Mac OS X's libc's confstr implementation
(and the related helpers). I found this by downloading:

	http://www.opensource.apple.com/darwinsource/tarballs/apsl/Libc-498.1.1.tar.gz

Inside gen/confstr.c, we have:

	docopy:
		if (len != 0 && buf != NULL)
			strlcpy(buf, p, len);
		return (strlen(p) + 1);
		
	...
	
	case _CS_DARWIN_USER_DIR:
		if ((p = alloca(PATH_MAX)) == NULL) {
			errno = ENOMEM;
			return (CONFSTR_ERR_RET);
		}
		if (_dirhelper(DIRHELPER_USER_LOCAL, p, PATH_MAX) == NULL)
			return (CONFSTR_ERR_RET);
		goto docopy;

The function _dirhelper() is defined in darwin/_dirhelper.c; it creates
the necessary directory if it doesn't already exist and then yields the
respective path.

In particular, _dirhelper() calls __user_local_dirname() (in  
_dirhelper.c),
which produces a uuid based on the user's uid and then encodes this  
into a
path-worthy string using encode_uuid_uid() (in _dirhelper.c), which  
performs
a possibly-non-standard encoding algorithm that maps to the following  
set of
characters:

	"+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

Hence! '+' is the only escape-requiring character in this context.

However, it is clear that this could change at any time, but I doubt  
anyone
would introduce paths that could make shells unhappy (as per the  
discussion
later).

> It doesn't follow then, from my perspective, that whenever someone has
> an escape-requiring char we stop requiring its escaping. Changing the
> escaped-char list needs a little more justification.

I agree. However, perhaps file-hierarchy 'addresses' could follow a
more forgiving specification.

>> It's really just a matter of deciding who has to do the grunt work---
>
> The grunt work in this case, assuming "+" is the only escape-requiring
> char, is to add 1 line "sed -e" that replaces "+" with "%2B", so I
> vote for the OS X port maintainer.

It doesn't really matter in the end, except that now there are all of
these pesky port eccentricities.

>>       Frankly, enforcing (arbitrarily) strict input is never a good  
>> idea,
>>       and I suggest making dbus more forgiving on a wide range of  
>> characters
>>       (for instance, all printable characters).
>
> Then people would have to do grunt work in quite a few cases to
> *escape* rather than unescape dbus addresses, to prevent them from
> causing a problem in shell command lines.
>
> The point of requiring escaping is to allow dbus addresses to be used
> without shell-escaping them in contexts such as setting
> DBUS_SESSION_BUS_ADDRESS on the command line.

It seems like this is mainly a problem for tools that produce shell
code on the fly (such as dbus-launch --sh-syntax); wouldn't it be
more useful to have dbus-launch do the necessary escaping?

In fact, having the special case of the shell influence the spec
so intimately seems even grosser than now making allowances for at
least file-hiearchy paths.

	If the specification has been written so as to make shell
	scripting easy, then why not write the specification so
	as to make file-hierarchy paths easy?

	I suppose I am proposing to *extend* the specification,
	in the sense that it would still be backward-compatible.

> The point of not requiring *all* chars to be escaped is that it makes
> the address mostly human-readable and much shorter.


Currently though, perfectly good file-hierarchy paths unnecessarily
morphed into annoyances that entail the discussion of design decisions,
sifting through abstruse code, and introducing special port-specific
code.

Sincerely,
Michael Witten


More information about the dbus mailing list