[systemd-devel] Detecting Systemd crash

František Šumšal frantisek at sumsal.cz
Mon Feb 5 11:43:14 UTC 2024


On 2/3/24 16:55, Álvaro Cebrián Juan wrote:
> Great question!
> 
> I am very interested in detecting systemd crashes too since I have experienced them recently and have been asked to come up with a solution to react when a PID1 crash happens.
> In fact, in my recent experiences, a journald crash was enough to render the system into an unreliable/degraded state in which some top-level applications worked while others didn't.
> 
> So adding to David's 1st question, I need to detect systemd and journald crashes and then trigger a `systemctl reboot --force --force` command

You can tell systemd to do just that, by setting CrashReboot=yes in system.conf [0][1]. It defaults to 'no' to avoid reboot loops.

[0] https://www.freedesktop.org/software/systemd/man/latest/systemd-system.conf.html#LogColor=
[1] https://www.freedesktop.org/software/systemd/man/latest/systemd.html#systemd.crash_reboot

> 
> I have also read that Linux Magic System Request Key (SysRq) can help in such scenarios but I don't know how they work.
> 
> Any help would be very appreciated.
> Thank you.
> 
> Some related links:
> https://news.ycombinator.com/item?id=19023695 <https://news.ycombinator.com/item?id=19023695>
> https://news.ycombinator.com/item?id=36873927 <https://news.ycombinator.com/item?id=36873927>
> https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html <https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html>
> 
> 
> El sáb, 3 feb 2024 a las 16:14, David Timber (<dxdt at dev.snart.me <mailto:dxdt at dev.snart.me>>) escribió:
> 
>     Systemd crashed on me the other day. I was writing up some Systemd units
>     and testing them out by daemon-reload every time I wanted to test them
>     out. Not the best way to go on about, I know. My bad abusing Systemd to
>     the point of crashing. Perhaps it was just a bit flip that caused this.
> 
>          systemd[2368]: Assertion 'path_is_absolute(p)' failed at
>          src/basic/chase.c:628, function chase(). Aborting.
>          systemd[1]: Assertion 'path_is_absolute(p)' failed at
>          src/basic/chase.c:628, function chase(). Aborting.
>          systemd[1]: Caught <ABRT> from our own process.
>          systemd-coredump[32497]: Due to PID 1 having crashed coredump
>          collection will now be turned off.
>          systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0
>          dumped core.
>          systemd[1]: Caught <ABRT>, dumped core as pid 32496.
>          systemd[1]: Freezing execution.
> 
>          ...
> 
>          systemd-journald[871]: Failed to send stream file descriptor to
>          service manager: Transport endpoint is not connected
> 
>     I didn't even bother trying producing stack trace. I can get on that if
>     anyone wants it. My machine started doing some weird things like Firefox
>     not being able to do Ajax properly whilst being able to go to a new
>     page, Chromium not being able to create a new tab whilst all the text
>     editors worked just fine, all the systemctl commands timing out. So
>     basically, I was using Linux without fork(). Anyway.
>     Well, I think any software can crash for any reason whatsoever. The
>     problem with Systemd I realised from this incident is that I had no way
>     of knowing that Systemd had crashed until I opened up the journal and
>     kernel logs and saw that Systemd had crashed some time ago. In this
>     particular incident, Systemd caught the signal and decided to just
>     freeze. No idea why you'd want that because if it had just crashed, the
>     kernel would have just panicked and I would have realised something went
>     wrong.
> 
>     1: So I decided that I need a some sort of "watchdog" that warns me when
>     something like this happens. Using dbus to poll the status of the
>     Systemd process, it could be a GUI app running under a seat, just a
>     daemon that writes a warning message using `wall` or just send mail
>     using a primed up MUA process. I wonder if someone already had the same
>     idea and went on to make one.
> 
>     2: How do I get Systemd to freeze to test such program? I mean, if I
>     kill Systemd, the kernel would crash so I have to somehow tell Systemd
>     to freeze?
> 


More information about the systemd-devel mailing list