<div dir="ltr"><div>Great question!</div><div><br></div><div>I am very interested in detecting systemd crashes too since I have experienced them recently and have been asked to come up with a solution to react when a PID1 crash happens.</div><div>In fact, in my recent experiences, a journald crash was enough to render the system into an unreliable/degraded state in which some top-level applications worked while others didn't.<br></div><div><br></div><div>So adding to David's 1st question, I need to detect systemd and journald crashes and then trigger a `systemctl reboot --force --force` command<br></div><div><br></div><div>I have also read that Linux Magic System Request Key (SysRq) can help in such scenarios but I don't know how they work.<br></div><div><br></div><div>Any help would be very appreciated.<br></div><div>Thank you.<br></div><div><br></div><div>Some related links:<br></div><div><a href="https://news.ycombinator.com/item?id=19023695">https://news.ycombinator.com/item?id=19023695</a></div><div><a href="https://news.ycombinator.com/item?id=36873927">https://news.ycombinator.com/item?id=36873927</a></div><div>
<a href="https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html">https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html</a>
</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">El sáb, 3 feb 2024 a las 16:14, David Timber (<<a href="mailto:dxdt@dev.snart.me" target="_blank">dxdt@dev.snart.me</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Systemd crashed on me the other day. I was writing up some Systemd units <br>
and testing them out by daemon-reload every time I wanted to test them <br>
out. Not the best way to go on about, I know. My bad abusing Systemd to <br>
the point of crashing. Perhaps it was just a bit flip that caused this.<br>
<br>
systemd[2368]: Assertion 'path_is_absolute(p)' failed at<br>
src/basic/chase.c:628, function chase(). Aborting.<br>
systemd[1]: Assertion 'path_is_absolute(p)' failed at<br>
src/basic/chase.c:628, function chase(). Aborting.<br>
systemd[1]: Caught <ABRT> from our own process.<br>
systemd-coredump[32497]: Due to PID 1 having crashed coredump<br>
collection will now be turned off.<br>
systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0<br>
dumped core.<br>
systemd[1]: Caught <ABRT>, dumped core as pid 32496.<br>
systemd[1]: Freezing execution.<br>
<br>
...<br>
<br>
systemd-journald[871]: Failed to send stream file descriptor to<br>
service manager: Transport endpoint is not connected<br>
<br>
I didn't even bother trying producing stack trace. I can get on that if <br>
anyone wants it. My machine started doing some weird things like Firefox <br>
not being able to do Ajax properly whilst being able to go to a new <br>
page, Chromium not being able to create a new tab whilst all the text <br>
editors worked just fine, all the systemctl commands timing out. So <br>
basically, I was using Linux without fork(). Anyway.<br>
Well, I think any software can crash for any reason whatsoever. The <br>
problem with Systemd I realised from this incident is that I had no way <br>
of knowing that Systemd had crashed until I opened up the journal and <br>
kernel logs and saw that Systemd had crashed some time ago. In this <br>
particular incident, Systemd caught the signal and decided to just <br>
freeze. No idea why you'd want that because if it had just crashed, the <br>
kernel would have just panicked and I would have realised something went <br>
wrong.<br>
<br>
1: So I decided that I need a some sort of "watchdog" that warns me when <br>
something like this happens. Using dbus to poll the status of the <br>
Systemd process, it could be a GUI app running under a seat, just a <br>
daemon that writes a warning message using `wall` or just send mail <br>
using a primed up MUA process. I wonder if someone already had the same <br>
idea and went on to make one.<br>
<br>
2: How do I get Systemd to freeze to test such program? I mean, if I <br>
kill Systemd, the kernel would crash so I have to somehow tell Systemd <br>
to freeze?<br>
<br>
</blockquote></div>