[systemd-devel] systemd-udevd: Race condition when rule starts both a systemd-mount and an unit accessing that mount
Andrei Borzenkov
arvidjaar at gmail.com
Sat Sep 4 16:17:14 UTC 2021
On 01.09.2021 14:39, Manuel Wagesreither wrote:
> Am Mi, 25. Aug 2021, um 18:51, schrieb Andrei Borzenkov:
>> On Wed, Aug 25, 2021 at 3:44 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>> ...
>>>> Here's the udev rule:
>>>> ```
>>>> ACTION=="add", SUBSYSTEMS=="usb", SUBSYSTEM=="block", KERNEL=="*[0-9]*", ENV{ID_FS_USAGE}=="filesystem", TAG+="systemd", ENV{SYSTEMD_WANTS}+="start-standalone-mender-deployment at media-$name.service", RUN{program}+="/usr/bin/systemd-mount --no-block --automount=no --options=ro --collect $devnode /media/$name"
>>>> ```
>>>>
>>>> And here's the systemd service:
>>>> It is templated and gets instantiated with "media-sdb1". It therefore has an "After=media-sdb1.mount". I suspect Systemd-udevd executes the ENV{SYSTEMD_WANTS} part before the RUN{program} part. Hence, "media-sdb1.mount" doesn't yet exist when the service gets started, as it gets created a tad later by systemd-mount.
>>>>
>>>> ```
>>>> [Unit]
>>>> Description=Start standalone Mender deployment (%i)
>>>> After=%i.mount
>>>>
>>>> [Service]
>>>> Type=oneshot
>>>> Restart=no
>>>> ExecStart=/bin/sh /usr/bin/start-standalone-mender-deployment.sh /%I
>>>> ```
>>>>
>> ...
>>>
>>> Hmm ... if systemd-mount --property accepts Wants and Before, your
>>> mount unit could pull in your service unit. I cannot test right now.
>>>
>>
>> Yes, this seems to work, so in principle
>>
>> RUN{program}+="/usr/bin/systemd-mount --no-block --automount=no
>> --options=ro --collect --property
>> Wants=start-standalone-mender-deployment at media-$name.service $devnode
>> /media/$name"
>>
>> is possible. Unfortunately this starts unit even if mount fails and
>> systemd-mount does not accept RequiredBy property". It is still
>> possible to add Requires to service itself.
>>
>> [Unit]
>> Description=Start standalone Mender deployment (%i)
>> After=%i.mount
>> Requires=%i.mount
>>
>> This will fail the service start job if the mount job fails.
>>
>> Wants on mount unit pulls in service, so we are guaranteed to always
>> have both start jobs - for mount and for service and dependencies are
>> observed.
>>
>
> I was unaware of the --property option of systemd-mount. It seems to be exactly what I was looking for.
>
> Thank you!
>
> Unfortunately, while testing, I encountered problems with systemd-mount.
>
> Sporadically, `systemd-mount /dev/sdb1 /media/sdb1` results in the following:
> ```
> $ journalctl -fu systemd-udevd -u media-sdb1.mount -u dev-sdb1.device -u systemd-fsck at dev-sdb1.service
> 15:55:46 systemd[1]: media-sdb1.mount: Failed to open /run/systemd/transient/media-sdb1.mount: No such file or directory
> 15:56:46 systemd-udevd[57294]: sdb1: Spawned process '/usr/bin/systemd-mount /dev/sdb1 /media/sdb1' [57295] is taking longer than 59s to complete
This is not expected with --no-block
> 15:56:46 systemd-udevd[3019]: sdb1: Worker [57294] processing SEQNUM=6665 is taking a long time
> 15:57:16 systemd[1]: dev-sdb1.device: Job dev-sdb1.device/start timed out.
> 15:57:16 systemd[1]: Timed out waiting for device /dev/sdb1.
> 15:57:16 systemd[1]: Dependency failed for /media/sdb1.
> 15:57:16 systemd[1]: media-sdb1.mount: Job media-sdb1.mount/start failed with result 'dependency'.
> 15:57:16 systemd[1]: Dependency failed for File System Check on /dev/sdb1.
> 15:57:16 systemd[1]: systemd-fsck at dev-sdb1.service: Job systemd-fsck at dev-sdb1.service/start failed with result 'dependency'.
> 15:57:16 systemd[1]: dev-sdb1.device: Job dev-sdb1.device/start failed with result 'timeout'.
> 15:57:16 systemd-udevd[57294]: sdb1: Process '/usr/bin/systemd-mount /dev/sdb1 /media/sdb1' failed with exit code 1.
> ```
> (Removed date and hostname for brevity.)
>
> While mounting, I had `watch ls /run/systemd/transient/` running, and could see that `media-sdb1.mount` pops into existence immediately when invoking systemd-mount. So whatever tries to access misses it just.
>
> Following to note:
> * In the example above, systemd-mount got invoked from the following udev rule:
> ```
> ACTION=="add", SUBSYSTEMS=="usb", SUBSYSTEM=="block", KERNEL=="*[0-9]*", ENV{ID_FS_USAGE}=="filesystem", RUN{program}+="/usr/bin/systemd-mount $devnode /media/$name"
> ```
Original rule was with "systemd-mount --no-block". Any reason you
removed --no-block now? udevd sends notification about new device after
RUN commands have completed; so your configuration deadlocks.
systemd-mount waits for block device to be announced and systemd-udevd
waits for systemd-mount to complete before announcing block device.
> * I triggered the rule with `udevadm trigger --verbose --action=add /dev/sdb`, and did --action=remove before triggering again.
> * When invoking from the shell, journalctl logs the same. (Minus the systemd-udevd lines obviously.)
> * `/bin/mount /dev/sdb1 /media/sdb1` works always
>
> This whole things seems very flaky, but there seems to be a pattern, as crazy as that may sound: Triggerung udev rule repeatedly results in the same outcome (mount working or not); same with invoking manually on shell. But the two are unrelated. At first it even seemed mounting from the rule ALWAYS fails and from the shell ALWAYS works, but then I started to observe just a few cases which proved me wrong. Now it doesn't work anymore either way. That said, yesterday I had a shell script running which was triggering a similar udev rule repeatadly ~900 times, and mounting worked every single time.
>
> I've got systemd 244 (244.3+) running. The distribution is created with Yocto. (https://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-core/systemd?h=0d4f80535e7ebb799a70761e83a40905f2c14535)
>
> Here's some info on the block device I'm using for the tests:
> ```
> # udevadm info /dev/sdb1
> P: /devices/3530000.xhci/usb2/2-1/2-1:1.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb1
> N: sdb1
> L: 0
> E: DEVPATH=/devices/3530000.xhci/usb2/2-1/2-1:1.0/host2/target2:0:0/2:0:0:0/block/sdb/sdb1
> E: DEVNAME=/dev/sdb1
> E: DEVTYPE=partition
> E: PARTN=1
> E: PARTNAME=manuelusbpart
> E: MAJOR=8
> E: MINOR=17
> E: SUBSYSTEM=block
> ```
>
> Regards,
> Manuel
>
More information about the systemd-devel
mailing list