[systemd-devel] dev-root.device is not active, results in an umount spree

Sun May 17 04:02:35 PDT 2015

Hey Lennart,

Lennart Poettering [2015-05-14 18:09 +0200]:
> > As I mentioned before, simply ignoring /dev/root doesn't help in all
> > cases, and hardcoding it in the code is a bit ugly.
> 
> It doesn't help in all cases? Which ones? Can you elaborate?

It doesn't seem to help at all in e. g. LXC. This is with systemd git
master in a "test" container: with "lxc.mount.auto = sys:rw cgroup":

# mount -o bind /bin/ /mnt; systemctl show -p ActiveState,SubState,BindsTo,BoundBy mnt.mount
BindsTo=dev-sda3.device
BoundBy=
ActiveState=deactivating
SubState=unmounting

Journal says:

Mai 17 12:37:05 test systemd[1]: mnt.mount: Unit is bound to inactive unit dev-sda3.device. Stopping, too.
Mai 17 12:37:05 test systemd[1]: Unmounting /mnt...
Mai 17 12:37:05 test systemd[1]: Unmounted /mnt.

because of the BindsTo=, which is my system partition on the host. The
container doesn't have any /dev/sd*.

# systemctl show -p ActiveState,SubState,BindsTo,BoundBy dev-sda3.device
BindsTo=
BoundBy=etc-schroot.mount
ActiveState=inactive
SubState=dead

This exists becayse of the manager_load_unit() in unit_add_node_link()
which synthesizes such .device units with "dead" state, and there's no
check whether they actually exist.

> And why is this ugly?
> 
> /dev/root is special in the kernel. No sysfs device by that name ever
> exists, it's pretty much a fixed string the kernel inserts into the
> mount table for the root device specified on the kernel cmdline. It
> never exists otherwise and hence it is the right logic to check for it
> explicitly and ignore it explicitly. It gets particularly confusing
> for btrfs file systems, since those are device-less file systems
> nominally, but when you use them as root fs without inird, they
> suddenly appear as device backed fs this way...

Agreed that /dev/root is "special" and not a real thing. I just don't
like hardcoding such names in code if it can be avoided, and IMHO here
it can: Handling this special case isn't sufficient (see above), and
it's not necessary as a more generic solution would include it.

That said, I don't want to dwell on the /dev/root special case -- if
you like it, let's keep it, it's not harmful. (I just noticed that in
other cases you tried to avoid hardcoding stuff).

> > The attached patch is a more general solution which stops creating
> > dead stub .device units for devices which don't exist at all. I. e.
> > while in your case -.mount was probably BindsTo=dev-root.device (or in
> > my case, in my container I had mnt.mount BindsTo=dev-sda3.device,
> > which exists on the host but not in the container), but with this
> > patch it should now not be bound to anything.
> 
> The patch is not right, Because it will not load device units for
> devices that have not been referenced yet. This would break things for
> mount entries that carry no deps on their own, but only reference a
> source device.

I'm not sure I understand this, can you please elaborate with an
example? I'd like to understand the use case for synthesizing dead
.device units which are known to not exist and binding to them.

> > Lennart, if you don't like this there is an alternative, and more
> > complicated solution: We could instead teach device_found_node() to
> > actually do create a proper .device with the passed DeviceFound (which
> > is "MOUNT"), resulting in creating a state "tentative" device. I have
> > a half-working patch for this, but it's brittle as with pretty much
> > any uevent or moutinfo change the status changes back to "dead",
> > causing the unmount attempt. IMHO this approach isn't conceptually
> > correct either -- in such containers, if don't have the corresponding
> > device nodes at all, we shouldn't try to react to hotplug events and
> > clean up mounts. I. e. the "tentative" concept does not really apply
> > there. But I have misunderstood the intent.
> 
> Hmm, not following I must admit.

In terms of the above example, what happens when parsing /mountinfo
generates a mnt.mount:

  - status quo: manager_load_unit() in unit_add_node_link() creates a
    dev-sda3.device which doesn't actually exist, mnt.mount then
    BindsTo= that, and gets unmounted because it's a "dead" device.

  - This patch: manager_get_unit() doesn't find a dev-sda3.device and
    hence no BindsTo= is added. This would break if adding /dev/*
    nodes *after* existing mounts is expected to add a BindsTo=.

  - Alternative: Fix device_found_node() to create a .device with the
    correct ("tentative") state if the device doesn't exist in /dev/
    or udev (yet). Then manager_load_unit() would not create a "dead"
    stub .device any more, but use the existing "tentative" .device
    and mounts would also stop being auto-unmounted.

So as this proposed patch apparently isn't good (I do believe that it
breaks stuff, I'd just like to understand how), I'll look into fixing
my half-working patch for the alternative.

> Note though that for containers we do not longer create any device
> dependencies anyway, see 47bc12e1ba35d38.

This doesn't work in a container with a writable /sys and udev
running. (I'm not entirely sure why people would do this, but
apparently they do; we get bug reports for them and there might be
legitimate use cases). But TBH such explicit checks like "is /sys
writable" appear to be more like a brittle hack than a robust
solution; this doesn't tell you anything about what your /dev looks
like or whether or not udev is running, etc.

I also have a gut feeling that Dimitri's use case with a plan9 file
system would still break with the /dev/root special case fix if the
file system wasn't the root fs but e. g. /home, and you try to
bind-mount anything from it.

Thanks,

Martin
-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)