[systemd-devel] PrivateDevices with more than basic set of devices?

Mon Jan 26 15:46:27 PST 2015

On Mon, 26.01.15 17:25, Topi Miettinen (toiwoton at gmail.com) wrote:

> On 01/26/15 16:13, Lennart Poettering wrote:
> > On Sat, 24.01.15 10:09, Topi Miettinen (toiwoton at gmail.com) wrote:
> > 
> >> Hello,
> >>
> >> It would be useful to be able to use PrivateDevices with additional
> >> devices to the basic set (null, zero, urandom etc). For example, smartd
> >> only needs access to /dev/sd*. It would be a bit complex to do this
> >> without help of systemd, you would have to set up the private /dev
> >> filesystem by hand before starting the daemon.
> > 
> > First of all, smartd usually only acesses /dev/sd*, but it actually
> > has drivers that use quite a few other device nodes too (for example
> > to support that weird SCSI SMART stuff). Thus, limiting access to only
> > /dev/sd* might work in many specific cases, but certainly not in the
> > general case.
> > 
> > However, the bigger problem is that setting /dev up like that would
> > not be a one-time thing. As new devices appear or old devices
> > disappear the device nodes in the service-specific /dev would have to
> > be created/updated/removed. And that's substantial complexity.
> > 
> > PrivateDevices= is supposed to be a one-stop solution for services
> > that that need zero device access, and I am not convinced we should
> > turn it into more than that. It's supposed to be simple, easy to
> > understand. Right now, I can tell people easily: "hey, if your service
> > never needs access to physical devices, turn on PrivateDevices=!", and
> > it is really that easy, there's nothing more to know. However, if we
> > turn this into something more than that, then the whole thing becomes
> > a ton more complex, not only in code, but also in explaining it to
> > people.
> 
> I think the code would not get much complex. The device list would be
> gathered and passed to mount_dev() in namespace.c. And documentation can
> be easily expanded.

No, it's more difficult than that: you also need to collect the
symlinks and stuff. And do that continously, to deal with
plugnplay/async device probing as mentioned, and match
some match expression against the symlinks and device nodes you
found. It *is* really complex.

> >> Or perhaps tmpfiles.d should be extended instead, that would allow more
> >> actions than just device setup? For example, unit files could point to a
> >> tmpfiles.d directory or file that will be processed inside the unit
> >> container before the unit is executed?
> > 
> > Both of these proposals cannot deal with devices coming and going, and
> > that's kind of a major deal breaker, since we try not to wait for
> > devices during boot-up more than necessary, and hence not even for
> > always plugged in devices this could ever work without races...
> 
> Maybe udev should be able to handle several /dev directories in the
> system, each with a different configuration...

What's nice about PrivateDevices= is that nobody sees the directories,
except for the service itself, and that the dir is automatically
unmounted the the fs released when the service dies. If you want udev
to manage that, then you would have to mount all per-service /dev
directories somewhere, so that udev could manage that, which makes the
automatic clean-up go away.

Also, don't forget that these days udev actually doesn't create a
single device node. The kernel does this with devtmps and udev then
just adds a couple of symlinks on top. That's quite a nice behaviour
and means we don't need for udev CAP_SYS_MKNOD at all!

Moreover I think the interdependencies between udev and PID 1 should
be kept at a minimum I think. Currently, PrivateDevices= only provides
device nodes that PID 1 initially created for the host, but nothing
else. Service management is thus not dependent on udev. But when you
change that, then you create all kinds of questions and dependency
ordering problems, since suddenly udev is both service-managed and
provides features for service management...

So, putting this altogether I am very sure that DevicesAllow= is
really the better option, and PrivateDevices= should stay the one-stop
solution for the cases where services really need zero device access.

> But independently of the PrivateDevices thing, would you think
> tmpfiles.d could be extended to be usable for unit specific cases
> instead of just one global setup? I think there could be more uses, for
> example, creating directories and links inside a unit's
> RuntimeDirectory.

I am not sure how this could work and what kind of integration you
precisely are looking for there?

Note that tmpfiles exists mostly for two reasons: a) to deal with old
software that wasn't capable of creating its own subdirs/stuff below
its runtime directory; and b) to deal with software whose main program
was running unpriviliged all the time (for example by using User=),
and hence lacked the priviliges to set up its subdir in /run.

Now, to deal with case b) we nowadays have RuntimeDirectory=. And for
a) I think the long story must be to make it set up its own stuff in
/run, and I don#t see how tmpfiles could break any benefit there...

Lennart

-- 
Lennart Poettering, Red Hat