D-Bus optimizations
Havoc Pennington
hp at pobox.com
Thu Mar 8 09:27:06 PST 2012
Mikkel this is all really well-phrased and I think encapsulates the
same lesson that was learned with GNOME 1.x and CORBA (and that
thousands of others learned the hard way before that).
The whole switch to dbus was in part a rejection of the idea that IPC
should replace library APIs, or that an IPC system should be our "COM"
(module/component system).
(See for example http://dbus.freedesktop.org/doc/dbus-faq.html#components)
I think you're right that the docs probably don't hammer on this point
enough and don't go into enough detail on when to use a shared library
vs. IPC, and how dbus APIs should be designed for things to be sane.
In some places the docs are just screwy. For example the spec has the
"low latency" bullet point but the description of the bullet is that
dbus is supposed to make it easy to be _async_, which is not the same
as low latency at all. Avoiding round trips was a pretty major
obsession of mine when designing dbus. Low latency (wall clock time of
a round trip) was not. Our experience with both X protocol and CORBA
was that even a small number of round trips made things sloooow, and
even a large number of non-round-trips was pretty fast.
A great diagnostic for dbus could be an env variable, and if set, it
would just print to stderr every time someone blocked on the dbus
connection. It could keep track of nice human-readable descriptions of
recently-sent serials, in order to log them usefully:
* blocked on org.foo.Bar.frobate for 10ms
* blocked on com.bar.Baz.frombulate for 5ms
that kind of log could really show people the problem, if they have
this problem. You could imagine other logging too, like quantity of
data sent and how it was batched up:
* wrote 2 messages total of 314 bytes
A lot of your other suggestions are great too. Assuming no round
trips, the pipeline can then be optimized to be sure stuff is batched
up so that reads, writes, and contexts switches are all minimized. I
do have a foggy memory that libdbus used to use a separate read() for
every message to avoid a memcpy, and that may well just be
wrongheaded, and easy to fix... just one speculative example. Also, I
think there's a separate writev() for each message, and you could
imagine writing multiple messages at once (though the code around
doing that is a little terrifying iirc).
It's really hard to solve a vague performance problem though. If we
had a system image and a use scenario that everyone could play with or
reproduce, then we could look at it and say "OK, there are too many
round trips here and here's a fix" or even "ugh, this is truly
unavoidably bottlenecked on dbus". As it is we can speculate that
there's an API design problem or that the apps could be reshuffled
somehow, but it's hard to really prove/disprove that claim.
To be more constructive: here's an old post that people are very
welcome to edit heavily and convert into FAQ or other documentation on
the dbus site:
http://blog.ometer.com/2007/05/17/best-d-bus-practices/
(or just link to it from the site if you want, that's a quick short-term hack)
That post could certainly be better-refactored so it more clearly
spells out really specific suggestions, like: always get objects in
groups; always get all properties at once; always use change
notification so a client can cache the remote side and still track
changes; open a sidechannel if you have megabytes of data; never use
the blocking versions of the libdbus/libgbus/etc. APIs; whatever else
people can think of. "D-Bus API Design Best Practices" document.
Another useful discussion would be "should your module be a library or
a process?" - should you write a daemon or a shared lib or both?
Again, no real idea if the actual system inspiring the multicast work
would be helped by this kind of stuff, but certainly plenty of code in
the wild has had big problems here (the old Gaim API mentioned in my
blog post was far worse than anything I would have expected anyone to
come up with...)
Havoc
More information about the dbus
mailing list