[systemd-devel] System stability when journald locks up

Marti Raudsepp marti at juffo.org
Mon May 28 10:33:14 PDT 2012


Hi list,

Long story short, I believe there are two problems with journald:
1) journald gets stuck in an infinte loop, trying to send the message
"Dropping message, as we can't find a place to store the data" to
somewhere -- occurs in v44 and v183 (Arch Linux)

2) A journald problem can effectively lock up the whole system. I
agree that reliable logging is a worthwhile goal, but it shouldn't
compromise the reliability of the whole system. Are there any plans to
address this failure mode?
I'm sure there are other ways how journald can get stuck -- attaching
a debugger or trying to write to a crashed hard drive or network file
system for instance.

I'll see if I can figure out a temporary solution for #1 to get my
computer back :)

----
Rant version:

Last night, I noticed my desktop computer (still using systemd v44)
spinning up its fans for no apparent reason. A quick inspection with
htop revealed that systemd-journald was using 100% CPU. Soon enough
the system became unusable entirely; I couldn't launch any more
terminals and current ones were stuck at "sudo", nor could I ssh in. I
presume everything was blocked behing logging.

Today I upgraded to systemd v183, rebooted, and after logging in the
same happened again. I managed to capture an strace snippet:

stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2175, ...}) = 0

sendmsg(5, {msg_name(29)={sa_family=AF_FILE,
sun_path="/run/systemd/journal/syslog"}, msg_iov(5)=[{"<44>", 4},
{"May 28 19:39:32 ", 16}, {"systemd-journald", 16}, {": ", 2},
{"Dropping message, as we can't find a place to store the data.",
61}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 ENOENT (No
such file or directory)
uname({sys="Linux", node="newn", ...})  = 0

writev(6, [{"<44>", 4}, {"systemd-journald", 16}, {"[1352]: ", 8},
{"Dropping message, as we can't find a place to store the data.", 61},
{"\n", 1}], 5) = 90


Removing /var/log/journal/* didn't help -- journald created new files
and got stuck again. Killing journald didn't help either -- of course
systemd launched it up again.

In desperation, I typed "systemctl stop systemd-journald.socket" and
systemd helpfully stopped my whole system. :)

Another reboot and now gdm and text consoles freeze when attempting to
log in. Dunno how I managed to log in before.

Regards,
Marti


More information about the systemd-devel mailing list