[systemd-devel] systemd-journald may crash during memory pressure
Kai Krakow
hurikhan77 at gmail.com
Sat Feb 10 13:23:34 UTC 2018
Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala:
> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote:
>> This last log lines indicates journald wasn't scheduled for a long
>> time which caused the watchdog to hit and journald was
>> aborted. Consider increasing the watchdog timeout if your system is
>> indeed that loaded and that's is supposed to be an OK thing...
>
> BTW I've seen the same behavior on a system with a single active
> process that uses enough memory to trigger significant swap use. I
> wonder if there has been a regression in the kernel causing misbehavior
> when swapping? The problems aren't specific to journald - desktop
> environment can totally freeze too etc.
This problem seems to be there since kernel 4.9 which was a real pita in
this regard. It's progressively becoming better since kernel 4.10. The
kernel seems trying to prevent swapping at any cost since then, at least
at the cost of much higher latency, and at the cost of pushing all cache
out of RAM.
The result is processes stuck for easily 30 seconds and more during
memory pressure. Sometimes I see the kernel loudly complaining in dmesg
about high wait times for allocating RAM, especially from the btrfs
module. Thus, the biggest problem may be that kernel threads itself get
stuck in memory allocations and are a victim of high latency.
Currently I'm running my user session in a slice with max 80% RAM which
seems to help. It helps not discarding all cache. I also put some
potentially high memory users (regarding cache and/or resident mem) into
slices with carefully selected memory limits (backup and maintenance
services). Slices limited in such a way will start swapping before cache
is discarded and everything works better again. Part of this problem may
be that I have one process running which mmaps and locks 1G of memory
(bees, a btrfs deduplicator).
This system has 16G of RAM which is usually plenty but I use tmpfs to
build packages in Gentoo, and while that worked wonderfully before 4.9, I
have to be really careful now. The kernel happily throws away cache
instead of swapping early. Setting vm.swappiness differently seems to
have no perceivable effect.
Software that uses mmap is the first latency victim of this new behavior.
As such, also systemd-journald seems to be hit hard by this.
After the system recovered from high memory pressure (which can take
10-15 minutes, resulting in a loadavg of 400+), it ends up with some
gigabytes of inactive memory in the swap which it will only swap back in
then during shutdown (which will also take some minutes then).
The problem since 4.9 seems to be that the kernel tends to do swap storms
instead of constantly swapping out memory at low rates during usage. The
swap storms totally thrash the system.
Before 4.9, the kernel had no such latency spikes under memory pressure.
Swap would usually grew slowly over time, and the system felt sluggish
one or another time but still usable wrt latency. I usually ended up with
5-8G of swap usage, and that was no problem. Now, swap only significantly
grows during swap storms with an unusable system for many minutes, with
latencies of 10+ seconds around twice per minute.
I had no swap storm yet since the last boot, and swap usage is around 16M
now. Before kernel 4.9, this would be much higher already.
--
Regards,
Kai
Replies to list-only preferred.
More information about the systemd-devel
mailing list