[systemd-devel] writing a systemd unit for nbd devices

Wouter Verhelst wouter at debian.org
Fri Mar 21 05:05:57 PDT 2014


Hi,

So, now that it looks like Debian is going to support systemd in the future,
I've been looking into writing systemd units for NBD, of which I
maintain the userspace both upstream and in Debian.

The user space consists of two bits that matter as far as bootup is
concerned: the client, and the server.

For the server side, that's just a standard daemon with no particularly
weird requirements. I haven't written the unit yet, but I don't expect
any problems there.

For the client, the situation is... slightly different, and I'd like
some advise on how to best move forward.

First, let me explain how NBD works.

The client side of the NBD protocol is implemented partially in
user space, and partially in kernel space. The user space part handles
connecting and the initial protocol negotiation; but once that has been
done, nbd-client calls the NBD_DO_IT ioctl() on an open /dev/nbdX file,
which hands the socket file descriptor to the kernel and which does not
return until the device is disconnected (with "nbd-client -d", or
because the link to the server died). As such, the nbd-client process
needs to continue running while the device is connected.

In addition, nbd-client needs to fork() and open() the /dev/nbdX device
to support partitioned NBD devices (due to a deadlock issue, that can't
be done from the initial NBD_DO_IT ioctl handling, so it is done in the
first open() instead).

For supporting root-on-NBD in conjunction with systemd, I've already
added a -systemd-mark option to nbd-client so it will make argv[0][0]
read as '@' (I think that method is slightly ugly, but that's a
discussion for another time). In Debian, I've already supported
root-on-NBD for quite a while with an initramfs script and some code in
the init script of nbd-client which adds the PID for the root NBD device
to the list of PIDs that shouldn't be killed; I understand that dracut
(and hence Fedora as well) have similar support (though I'm not sure how
well it all works).

For non-root NBD devices, however, the situation would be slightly more
complex. This is supported in Debian by the package's init script;
AFAIK, though, no other distribution has support for that in its init
scripts (or upstart/systemd configuration, yada yada).

Currently, in Debian, the situation is that there is a configuration
file, /etc/nbd-client, which is sourced in the init script, and which
contains bash arrays with configuration. The init script then loops over
those bash arrays and runs the appropriate nbd-client command to connect
the device. Any actual mounting (etc) of the device, then, is left to
other init scripts. It expects that filesystems on NBD devices have the
"_netdev" option in its fstab entry listed, so that it will be mounted
by the "mountnfs" rather than "mountall" init script.

As can be expected, this took several iterations to get right in all
corner cases. It seems to be working fine now, however.

When converting this to systemd unit files, from skimming over the
documentation, I guess I'll need something along the lines of the
following:

- I will need to create dev-nbd at .device unit files. These unit files would
  connect the device when needed.
- It may be a good idea to move the configuration from a sourced shell
  script snipped to "something else". I do want to retain some backwards
  compatibility, but it's okay if that's just a program interpreting the
  shell script snippet and outputting something more modern.

I do foresee some problems, though, and I'd like to see if these are
indeed problems or whether I just need to read more documentation. I
haven't found an easy answer in the documentation that I've read, but
then maybe I haven't been looking very well.

NBD device nodes are a bit special in that due to the way NBD devices
are connected, the device must exist at all times, even before it is
connected; I suspect (though have not actually tried) that systemd will
only try to "start" a .device unit file if the device node itself is not
there yet. For NBD, the difference between a connected device and a
not-connected one can be spotted in the apparent size of the block
device (the BLKGETSIZE64 ioctl will return 0 for a not-connected device)
and in the presence (or lack thereof) of a file /sys/block/nbdX/pid (if
it exists, it contains the PID of the nbd-client process handling the
connection; if it does not, the device is not connected), not by the
presence (or lack thereof) of the device node itself.

This is not the case for partitions of NBD devices, however; these will
only show up after the first open(), as explained above. As such, I
might need two templates: one which connects the NBD device (for a
/dev/nbdX device), and one for the partition (/dev/nbdXpY) which simply
depends on the regular NBD device. However, if I understand correctly,
it would not seem to be possible to create an nbdX template that does
not also match nbdXpY.

Any thoughts?

-- 
This end should point toward the ground if you want to go to space.

If it starts pointing toward space you are having a bad problem and you
will not go to space today.

  -- http://xkcd.com/1133/


More information about the systemd-devel mailing list