DBus over the network; clustered machines acting as one; service discovery

Tue Apr 8 17:26:01 PDT 2008

I found an old thread along these lines:

http://osdir.com/ml/freedesktop.dbus/2005-01/msg00141.html

Has anybody been running dbus across the network yet, for example to
bind together a cluster of machines?  IMO that's probably the next
step beyond today's multi-core SMP machines.  I bet on an OpenMosix
cluster, services could probably migrate, right? but that's not very
popular for some reason.  Doesn't work with heterogeneous processor
architectures, either.

Security can be orthogonal (just use ssh or something similarly transparent).

Assuming yes (dbus works as well with TCP as with Unix sockets) the
next question is how to discover the bus?  This is similar to my other
question about discovering the session bus.

If a user's "session" spans a cluster of machines, then each machine
needs to find and connect to that one session bus which ties them all
together (and it's a good question which machine ought to actually
start the bus; but it could be autostarted on-demand right?)  IMO this
could be done with Avahi.

Avahi has a dbus interface for doing discovery of advertised network
services (another thing on my list of stuff to figure out), but can
you then use dbus to connect to those services also?
<type>_dbus._ssh._tcp</type> for example in
/etc/avahi/services/something.service

Likewise the system bus could be thought of as spanning the machines.
On one hand, each machine may provide specific services (depending on
how the load is distributed, and which machines have the hardware for
certain services).  On the other hand, all of the machines in the
cluster need to be able to find and use each others' services, and the
apps mostly don't care - they just need the service to be provided
somewhere.  So maybe there should be a system bus for each machine
plus an aggregated system bus for the cluster.  The aggregator could
be the apparent system bus daemon, running on each machine, and it
uses service discovery to find the system bus daemons running on the
others and expose their services.  But then, you wouldn't be making a
dbus connection across the network; the app would connect via dbus to
that daemon which in turn connects via some service-specific protocol
across the network to the service on the other machine.  Or, the
aggregator could have its own bus, so the app has a choice whether to
find local-machine services only or find those plus all the cluster
services, depending which bus it connects to.

What I eventually want to build with Display Scheme is just such a
cluster.  For example there may be a high-resolution display server,
connected to a large monitor or projector; multiple small devices
(phone, tablet); multi-touch screens (the coffee table, or the desktop
itself, like in the old Starfire video); and other input devices could
be connected to different machines as well.  The keyboard might be
WiFi rather than PS/2 or USB, and might also be able to handle some
small processing tasks.  The whole room full of hardware acts like one
big computer, with some devices being better suited for running some
kinds of services.  I don't have to use DBus for all of the
communication, I'm just wondering if it could be a good fit after a
bit of stretching.  :-)  The first way that I'm trying is to simply
have one Scheme REPL talking to another across TCP or SSH.  But this
only works for software written in Scheme, so it's not a very
interoperable component architecture (although there could be Scheme
wrappers for services written in other languages.)

Another step beyond that would be to work with non-IP networks too,
like Bluetooth for instance.  (It already has its own security, and
building a PAN with TCP/IP on top can work, but it's extra overhead.)

It seems there must be about 3 levels of discovery in general:
- discover the buses (local, remote, IP and non-IP)
- discover the services
- discover the methods that can be called and signals that can be sent
and received

The namespace also includes paths and interfaces: so when finding
methods, either you have a couple more steps, or you want to find all
of the methods provided by a service (and hope it's a small enough
list, because how much stuff is one service likely to do anyway?  it's
better to have more and smaller services rather than big monoliths
that do everything).

Sorry for going all pie-in-the-sky all at once.  :-)  But really I
plan on spending some more years of spare time on this stuff (to the
extent that other people don't get there before I do, which is fine).