[systemd-devel] systemd-udevd: Race condition when rule starts both a systemd-mount and an unit accessing that mount

Manuel Wagesreither ManWag at FastMail.FM
Mon Sep 6 16:00:54 UTC 2021


Am Sa, 4. Sep 2021, um 18:17, schrieb Andrei Borzenkov:
> On 01.09.2021 14:39, Manuel Wagesreither wrote:
> > Am Mi, 25. Aug 2021, um 18:51, schrieb Andrei Borzenkov:
> >> On Wed, Aug 25, 2021 at 3:44 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> >> ...
> >>>> Here's the udev rule:
> >>>> ```
> >>>> ACTION=="add", SUBSYSTEMS=="usb", SUBSYSTEM=="block", KERNEL=="*[0-9]*", ENV{ID_FS_USAGE}=="filesystem", TAG+="systemd", ENV{SYSTEMD_WANTS}+="start-standalone-mender-deployment at media-$name.service", RUN{program}+="/usr/bin/systemd-mount --no-block --automount=no --options=ro --collect $devnode /media/$name"
> >>>> ```
> >>>>
> >>>> And here's the systemd service:
> >>>> It is templated and gets instantiated with "media-sdb1". It therefore has an "After=media-sdb1.mount". I suspect Systemd-udevd executes the ENV{SYSTEMD_WANTS} part before the RUN{program} part. Hence, "media-sdb1.mount" doesn't yet exist when the service gets started, as it gets created a tad later by systemd-mount.
> >>>>
> >>>> ```
> >>>> [Unit]
> >>>> Description=Start standalone Mender deployment (%i)
> >>>> After=%i.mount
> >>>>
> >>>> [Service]
> >>>> Type=oneshot
> >>>> Restart=no
> >>>> ExecStart=/bin/sh /usr/bin/start-standalone-mender-deployment.sh /%I
> >>>> ```
> >>>>
> >> ...
> >>>
> >>> Hmm ... if systemd-mount --property accepts Wants and Before, your
> >>> mount unit could pull in your service unit. I cannot test right now.
> >>>
> >>
> >> Yes, this seems to work, so in principle
> >>
> >> RUN{program}+="/usr/bin/systemd-mount --no-block --automount=no
> >> --options=ro --collect --property
> >> Wants=start-standalone-mender-deployment at media-$name.service $devnode
> >> /media/$name"
> >>
> >> is possible. Unfortunately this starts unit even if mount fails and
> >> systemd-mount does not accept RequiredBy property". It is still
> >> possible to add Requires to service itself.
> >>
> >> [Unit]
> >> Description=Start standalone Mender deployment (%i)
> >> After=%i.mount
> >> Requires=%i.mount
> >>
> >> This will fail the service start job if the mount job fails.
> >>
> >> Wants on mount unit pulls in service, so we are guaranteed to always
> >> have both start jobs - for mount and for service and dependencies are
> >> observed.
> >>
> > 
> > I was unaware of the --property option of systemd-mount. It seems to be exactly what I was looking for.
> > 
> > Thank you!
> > 
> > Unfortunately, while testing, I encountered problems with systemd-mount.
> > 
> > Sporadically, `systemd-mount /dev/sdb1 /media/sdb1` results in the following:
> > ```
> > $ journalctl -fu systemd-udevd -u media-sdb1.mount -u dev-sdb1.device -u systemd-fsck at dev-sdb1.service
> > 15:55:46 systemd[1]: media-sdb1.mount: Failed to open /run/systemd/transient/media-sdb1.mount: No such file or directory
> > 15:56:46 systemd-udevd[57294]: sdb1: Spawned process '/usr/bin/systemd-mount /dev/sdb1 /media/sdb1' [57295] is taking longer than 59s to complete
> 
> This is not expected with --no-block
> 
> > 15:56:46 systemd-udevd[3019]: sdb1: Worker [57294] processing SEQNUM=6665 is taking a long time
> > 15:57:16 systemd[1]: dev-sdb1.device: Job dev-sdb1.device/start timed out.
> > 15:57:16 systemd[1]: Timed out waiting for device /dev/sdb1.
> > 15:57:16 systemd[1]: Dependency failed for /media/sdb1.
> > 15:57:16 systemd[1]: media-sdb1.mount: Job media-sdb1.mount/start failed with result 'dependency'.
> > 15:57:16 systemd[1]: Dependency failed for File System Check on /dev/sdb1.
> > 15:57:16 systemd[1]: systemd-fsck at dev-sdb1.service: Job systemd-fsck at dev-sdb1.service/start failed with result 'dependency'.
> > 15:57:16 systemd[1]: dev-sdb1.device: Job dev-sdb1.device/start failed with result 'timeout'.
> > 15:57:16 systemd-udevd[57294]: sdb1: Process '/usr/bin/systemd-mount /dev/sdb1 /media/sdb1' failed with exit code 1.
> > ```
> > (Removed date and hostname for brevity.)
> > 
> > While mounting, I had `watch ls /run/systemd/transient/` running, and could see that `media-sdb1.mount` pops into existence immediately when invoking systemd-mount. So whatever tries to access misses it just.
> > 
> > Following to note:
> > * In the example above, systemd-mount got invoked from the following udev rule:
> > ```
> > ACTION=="add", SUBSYSTEMS=="usb", SUBSYSTEM=="block", KERNEL=="*[0-9]*", ENV{ID_FS_USAGE}=="filesystem", RUN{program}+="/usr/bin/systemd-mount $devnode /media/$name"
> > ```
> 
> Original rule was with "systemd-mount --no-block". Any reason you
> removed --no-block now? udevd sends notification about new device after
> RUN commands have completed; so your configuration deadlocks.
> systemd-mount waits for block device to be announced and systemd-udevd
> waits for systemd-mount to complete before announcing block device.

I removed --no-block and the other options in order to isolate the problem and provide a "clean" problem demonstrator. I don't have any notes besides what I wrote above. I assume when I started experiencing problems I still had --no-block as option, because I can't recall removing it for any other reason.

The deadlock explains why systemd-mount without --no-block doesn't play nice when invoked from an udev rule.

But it doesn't explain systemd-mount timing out when invoked from the shell.

Suspicion:
Does `udevadm trigger --verbose --action=remove /dev/sdb` have an effect on the device being accesible to systemd-mount?

I tested the udev rule using `udevadm trigger` with --action=add and =remove, and a few of the systemd-mount invocations were probably after --action=remove. I didn't not expect it to affect systemd-mount, as even after --action=remove the device still showed up in `lsblk`. If it does affect systemd-mountability, it could explain what I observed as "flakyness".

I'm going to try this out.


> > * I triggered the rule with `udevadm trigger --verbose --action=add /dev/sdb`, and did --action=remove before triggering again.
> > * When invoking from the shell, journalctl logs the same. (Minus the systemd-udevd lines obviously.)
> > * `/bin/mount /dev/sdb1 /media/sdb1` works always
> > 
> > This whole things seems very flaky, but there seems to be a pattern, as crazy as that may sound: Triggerung udev rule repeatedly results in the same outcome (mount working or not); same with invoking manually on shell. But the two are unrelated. At first it even seemed mounting from the rule ALWAYS fails and from the shell ALWAYS works, but then I started to observe just a few cases which proved me wrong. Now it doesn't work anymore either way. That said, yesterday I had a shell script running which was triggering a similar udev rule repeatadly ~900 times, and mounting worked every single time.
> > 
> > I've got systemd 244 (244.3+) running. The distribution is created with Yocto. (https://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-core/systemd?h=0d4f80535e7ebb799a70761e83a40905f2c14535)
> > 
> > Here's some info on the block device I'm using for the tests:
> > ```
> > # udevadm info /dev/sdb1                      
> > P: /devices/3530000.xhci/usb2/2-1/2-1:1.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb1
> > N: sdb1
> > L: 0
> > E: DEVPATH=/devices/3530000.xhci/usb2/2-1/2-1:1.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb1
> > E: DEVNAME=/dev/sdb1
> > E: DEVTYPE=partition
> > E: PARTN=1
> > E: PARTNAME=manuelusbpart
> > E: MAJOR=8
> > E: MINOR=17
> > E: SUBSYSTEM=block
> > ```
> > 
> > Regards,
> > Manuel
> > 
> 


More information about the systemd-devel mailing list