python dbus: dealing with strings
simon.mcvittie at collabora.co.uk
Wed Sep 7 02:41:07 PDT 2011
On Wed, 07 Sep 2011 at 10:04:05 +0200, Neal H. Walfield wrote:
> I've found that dbus.UTF8String sometimes fails if its argument is a
> unicode string that contains non-ascii characters (also: dbus.String
> fails if its argument is a normal string that contains non-ascii
dbus.UTF8String is a subtype of str; dbus.String is a subtype of unicode.
That should hopefully tell you all you need to know about the behaviour
of their constructors, and the misfeatures they inherit from those Python 2
You might be misinterpreting them as distinct D-Bus types: they're not,
they're distinct Python representations of one D-Bus type ("s"). You can
use whichever you prefer and you'll get a D-Bus string sent to dbus-daemon -
D-Bus strings are always UTF-8, but dbus-python knows how to turn a Python
unicode object (or equivalently, a dbus.String) into UTF-8 for sending.
> >>> dbus.UTF8String(u'ä')
This is equivalent to dbus.UTF8String(str("ä")) which uses the global default
codec; the "default default" is ASCII, so it fails. dbus.String(u"ä") would
> >>> dbus.UTF8String('ä')
U+00E4 LATIN SMALL LETTER A WITH DIARESIS is 0xC3 0xA4 in UTF-8, so this is
working correctly (in this particular case that's because you're using
the interactive Python prompt in a UTF-8 locale - I'd guess you're on modern
Linux or Mac OS).
This is closer to what UTF8String is designed for - it's very convenient
when interfacing with libraries like Gtk, where string-typed properties etc.
are 'str' objects which are guaranteed to contain valid UTF-8.
The meaning of the UTF8String constructor is essentially "You see this 8-bit
string? I assert that it is valid UTF-8. Assume that it is." (I forget whether
dbus-python checks this assumption; it should.)
> Further, the other dbus conversion functions
> appear to do their best to convert the argument to the right type.
> >>> dbus.Int32("3")
This is inherited from the 'int' type in the same way (Int32 is a subtype of
> In my particular case, I want to avoid burdening the client with
> having to do casts itself by, say, only accepting UTF8 strings: the
> data is coming from the web via feedparser.
If the client gives you unicode objects, you can pass them straight through and
dbus-python will do the right thing (i.e. convert from whatever Python's
Unicode type is on your platform - typically UCS-4 on Linux, UTF-16 on
Windows - to UTF-8).
If the client gives you str objects, there is no right thing for dbus-python
to do - you're giving it a blob of bytes whose encoding is unknown,
dbus-python needs UTF-8, so it can never do better than a wild guess without
more information. The client has to tell you what encoding the data is in,
otherwise you (and dbus-python) have no way to know what the right thing to
To be honest, the easiest way is probably to require a unicode object,
and fail on non-unicode input, requiring the caller to decode bytes of
unspecified encoding in advance: explicit is better than implicit.
# given this input:
a1 = u"\u00e4" # U+00E4 LATIN SMALL LETTER A WITH DIARESIS
a2 = "\xe4" # 0xE4, the Latin-1 encoding of U+00E4
a3 = "\xc3\xa4" # 0xC3 0xA4, the UTF-8 encoding of U+00E4
# these three are equivalent
foo(a1, b, c)
foo(a2.decode('latin-1'), b, c)
foo(a3.decode('utf-8'), b, c)
With hindsight, dbus-python should only have accepted unicode objects.
> def foo(a, b, c):
> return iface.foo(a if isinstance(a, unicode) else dbus.UTF8String(a),
> dbus.Int32(b), dbus.Boolean(c)
This will fail if a is a str object containing non-UTF8 data (in practice,
this typically means a str object containing latin-1 or windows-1252 data,
but it could be something more exotic).
> That's ugly.
Welcome to Python 2 unicode handling... (Python 3 fixes most of it.)
More information about the dbus