[systemd-devel] mdmon at md127 is stopped early

Mariusz Tkaczyk mariusz.tkaczyk at linux.intel.com
Wed Feb 9 16:16:59 UTC 2022


Hi,
I'm working on Intel Matrix RAID solution (IMSM). It is integrated with
kernel md driver and supported on Linux. So, we are using the same
kernel driver but we don't have metadata management inside the driver.
It is done in userspace by mdmon tool (it is part of mdadm project[1]).
So, this require systemd service, and we already have one[2]. The
service is called by udev rule or by mdadm itself. It has worked without
real problems for several years, until now. Recently I discovered
problem with reboot flow, system installed on RAID hangs with following
trace:

# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.0 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.0 Beta (Plow)"


# systemctl --version
systemd 249 (249-7.el9_b)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS
+OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD
+LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2
+LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT
default-hierarchy=unified


[ 2338.114810] INFO: task kworker/21:0:18510 blocked for more than 122
seconds.
[ 2338.126767]       Not tainted 5.14.0-39.el9.x86_64 #1
[ 2338.136696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2338.149469] task:kworker/21:0    state:D stack:    0 pid:18510 ppid:
    2 flags:0x00004000
[ 2338.162832] Workqueue: md md_submit_flush_data
[ 2338.172231] Call Trace:
[ 2338.179612]  __schedule+0x203/0x560
[ 2338.187950]  schedule+0x43/0xb0
[ 2338.195926]  md_write_start.part.0+0x18f/0x230
[ 2338.205200]  ? do_wait_intr_irq+0xa0/0xa0
[ 2338.214040]  raid1_make_request+0x4f/0xa0 [raid1]
[ 2338.223507]  md_handle_request+0x129/0x1c0
[ 2338.232358]  process_one_work+0x1e0/0x3b0

[ 2338.241096]  worker_thread+0x50/0x3b0
[ 2338.249509]  ? rescuer_thread+0x370/0x370
[ 2338.258296]  kthread+0x146/0x170
[ 2338.266258]  ? set_kthread_struct+0x40/0x40
[ 2338.275118]  ret_from_fork+0x1f/0x30
[ 2338.283373] INFO: task umount:18571 blocked for more than 123
seconds.
[ 2338.294608]       Not tainted 5.14.0-39.el9.x86_64 #1
[ 2338.304332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

The trace is caused by missing mdmon response. I did investigation and
found out that the mdmon service is stopped early:

         Stopping Self Monitoring a…g Technology (SMART) Daemon...
         Stopping OpenSSH server daemon...
         Stopping Load/Save Random Seed...
[  OK  ] Stopped MD Metadata Monitor on /dev/md127.
[  OK  ] Stopped Avahi mDNS/DNS-SD Stack.
[  OK  ] Stopped irqbalance daemon.
[  OK  ] Stopped NTP client/server.
[  OK  ] Stopped libstoragemgmt plug-in server daemon.
[  OK  ] Stopped Machine Check Exception Logging Daemon.
[  OK  ] Stopped Software RAID monitoring and management.
[  OK  ] Stopped Self Monitoring an…ing Technology (SMART) Daemon.
[  OK  ] Stopped Modem Manager.
[  OK  ] Stopped CUPS Scheduler.
[  OK  ] Stopped Enable periodic up… of entitlement certificates..
[  OK  ] Stopped OpenSSH server daemon.
[  OK  ] Stopped Deferred execution scheduler.
[  OK  ] Stopped Command Scheduler.
[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Serial Getty on ttyS0.
[  OK  ] Stopped Session 1 of User root.
[  OK  ] Stopped Session 3 of User root.


but I can't find mdmon at md127 in:
# systemctl list-dependencies --after shutdown.target | grep mdmon
● ├─mdmonitor.service
● ├─system-mdmon.slice
● │ │ ├─mdmonitor.service
● │ ├─mdmonitor.service


I checked also following:
systemctl show mdmon at md127 | grep Before=
Before=initrd-switch-root.target
# systemctl show mdmon at md127 | grep After=
After=system-mdmon.slice systemd-journald.socket

I did a lot of modification of service and the best result I achieved
until now was to modify:
"Before=initrd-switch-root.target local-fs-pre.target"
but it hangs in systemd-shutdown on filesystem syncing.

I read systemd advises[3] and unfortunately I have to say that we didn't
follow them. Our mdmon service is restarted after switch-root. Please
see description in service[2]. We are setting '@' in argv[0][0] to
prevent process from being killed: #ps -ef | grep mdmon
{...} @usr/sbin/mdmon --offroot --takeover md127

It is probably wrong, but it worked this way for many years:
"Again: if your code is being run from the root file system, then this
logic suggested above is NOT for you. Sorry. Talk to us, we can
probably help you to find a different solution to your problem."[3]

How can I block the service from being stopped? In initramfs there is a
mdmon restart procedure, for example in dracut[4]. I need to save
mdmon process from being stopped.

I will try to adapt our implementation to your[3] suggestions but it is
longer topic, I want to workaround the issue first.

[1]https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
[2]https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/systemd/mdmon@.service
[3]https://systemd.io/ROOT_STORAGE_DAEMONS/
[4]https://github.com/dracutdevs/dracut/blob/master/modules.d/90mdraid/mdmon-pre-shutdown.sh

TIA,
Mariusz


More information about the systemd-devel mailing list