[systemd-devel] [PATCH v3] device: Fix overzealous unmounting of tentative device mounts

Tue May 19 06:17:30 PDT 2015

Hey Lennart,

Lennart Poettering [2015-05-19 13:56 +0200]:
> I have now committed a different fix now, that keeps counting of the
> mount points in mount.c, instead of "reaching over" from device.c.
> 
> I only gave this light testing, would be cool if you could check if
> this fixes things for you.
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/?id=fcd8b266edf0df2b85079fcf7b099cd4028740e6
> 
> This commit will now collect two sets of devices while going through
> /proc/self/mountinfo: the devices of lines that are no longer there,
> and the devices of lines that are there. Only for devices in the
> former set that are not in the latter we will now propagate an event
> to device.c.
> 
> Does this make sense?

It does, and it indeed should avoid some round trips. However, it does
not work yet. I added this for extra debugging:

--- a/src/core/device.c
+++ b/src/core/device.c
@@ -771,6 +771,9 @@ int device_found_node(Manager *m, const char *node, bool add, DeviceFound found,
         assert(m);
         assert(node);
 
+        if (node[0] == '/')
+                log_warning("XXXX device_found_node node %s add %i found %i now %i", node, add, found, now);
+
         /* This is called whenever we find a device referenced in
          * /proc/swaps or /proc/self/mounts. Such a device might be
          * mounted/enabled at a time where udev has not finished

After unmounting /tmp/etc, my dev-sda3.device (which plays the role of
"dev-foo.device") still becomes "dead", and /tmp/boot gets unmounted:

---- unmounting /tmp/etc ----
Id=dev-sda3.device
BindsTo=
BoundBy=
ActiveState=inactive
SubState=dead

Id=tmp-boot.mount
BindsTo=
BoundBy=
ActiveState=inactive
SubState=dead

Journal follows. The first bits are from the manual "mount" commands:


| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: tmp-etc.mount: Changed dead -> mounted
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: tmp-boot.mount: Changed dead -> mounted
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1

Now "umount /tmp/etc" happens:

| systemd[1]: tmp-etc.mount: Changed mounted -> dead
| systemd[1]: XXXX device_found_node node /dev/sda3 add 0 found 2 now 1

^ So device_found_node() already gets called here, although there
should still be another active mount on sda3. This now causes the
usual cleanup unmount slaughter:

| systemd[1]: dev-sda3.device: Changed tentative -> dead
| systemd[1]: tmp-boot.mount: Trying to enqueue job tmp-boot.mount/stop/replace
| systemd[1]: tmp-boot.mount: Installed new job tmp-boot.mount/stop as 357
| systemd[1]: tmp-boot.mount: Enqueued job tmp-boot.mount/stop as 357
| systemd[1]: tmp-etc.mount: Collecting.
| systemd[1]: Failed to reset devices.list on /lxc/test/system.slice/tmp-boot.mount: Permission denied

(FTR, I get thousands of those, but that's unrelated)

| systemd[1]: tmp-boot.mount: About to execute: /bin/umount /tmp/boot -n
| systemd[1]: tmp-boot.mount: Forked /bin/umount as 641
| systemd[1]: tmp-boot.mount: Changed mounted -> unmounting
| systemd[1]: Unmounting /tmp/boot...
| systemd[641]: tmp-boot.mount: Executing: /bin/umount /tmp/boot -n

I'm not sure where this comes from now; there is no manual mount
command to bring back /tmp/boot. It looks like it "bounces", and
quickly remounts /tmp/boot and then unmounts it again:

| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: dev-sda3.device: Changed dead -> tentative
| systemd[1]: tmp-boot.mount: Trying to enqueue job tmp-boot.mount/start/fail
| systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
| systemd[1]: XXXX device_found_node node /dev/sda3 add 0 found 2 now 1
| systemd[1]: dev-sda3.device: Changed tentative -> dead
| systemd[1]: Received SIGCHLD from PID 641 (umount).
| systemd[1]: Child 641 (umount) died (code=exited, status=0/SUCCESS)
| systemd[1]: tmp-boot.mount: Child 641 belongs to tmp-boot.mount
| systemd[1]: tmp-boot.mount: Mount process exited, code=exited status=0
| systemd[1]: tmp-boot.mount: Changed unmounting -> dead
| systemd[1]: tmp-boot.mount: Job tmp-boot.mount/stop finished, result=done
| systemd[1]: Unmounted /tmp/boot.
| systemd[1]: tmp-boot.mount: Collecting.
| systemd[1]: dev-sda3.device: Collecting.

My first hunch is that this is caused by calling
mount_load_proc_self_mountinfo() in mount_dispatch_io()
(src/core/mount.c:1682) *before* it goes through that new
SET_FOREACH() loop. That call will already see the removed mount and
call device_found_node() with the removal.

I'll debug this more closely ASAP, just need to finish something else
first.

Martin
-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)