[systemd-devel] Q: write error, watchdog, journald core dump, ordering of entries

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 6 08:29:29 UTC 2021


Hi!

We had a stuck networkc card on a server that seems to have caused the RAID controller with two SSDs to be stuck on write as well.
Anyway journald dumped core with this stack:
Oct 05 20:13:25 h19 systemd-coredump[26759]: Process 3321 (systemd-journal) of user 0 dumped core.
Oct 05 20:13:25 h19 systemd-coredump[26759]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.a4eb19afcc314d99936cbdd5542e4fed.3321.1633457585000000.lz4
Oct 05 20:13:25 h19 systemd-coredump[26759]: Stack trace of thread 3321:
Oct 05 20:13:25 h19 systemd-coredump[26759]: #0  0x00007f913492d0c2 journal_file_append_object (libsystemd-shared-234.so)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #1  0x00007f913492dba3 n/a (libsystemd-shared-234.so)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #2  0x00007f913492fc79 journal_file_append_entry (libsystemd-shared-234.so)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #3  0x0000557fe532908d n/a (systemd-journald)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #4  0x0000557fe532b15f n/a (systemd-journald)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #5  0x0000557fe5324664 n/a (systemd-journald)
Oct 05 20:13:25 h19 systemd-coredump[26759]: #6  0x0000557fe5326a80 n/a (systemd-journald)
Oct 05 20:13:25 h19 kernel: printk: systemd-coredum: 6 output lines suppressed due to ratelimiting

(systemd-234-24.90.1.x86_64 of SLES15 SP2 on x86_64)

journald seems to have restarted later, but I wonder about the ordering of the entries following:
Oct 05 20:13:25 h19 systemd-journald[26760]: Journal started
Oct 05 20:13:25 h19 systemd-journald[26760]: System journal (/var/log/journal/8695c89eb080463dad2ca9f9aaedf162) is 928.0M, max 4.0G, 3.0G free.

Oct 05 20:12:52 h19 systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Oct 05 20:12:52 h19 systemd[1]: systemd-journald.service: Killing process 3321 (systemd-journal) with signal SIGABRT.
Oct 05 20:13:25 h19 systemd[1]: Starting Flush Journal to Persistent Storage...
Oct 05 20:13:25 h19 systemd[1]: Started Flush Journal to Persistent Storage.

I don't understand why the core dump is logged before the signal being sent and the watchdog timeout.

(a journalctl -f running on a terminal produced the entried in the same odd order)

Can someone explain?

Regards,
Ulrich





More information about the systemd-devel mailing list