[systemd-devel] systemd-journald may crash during memory pressure

Kai Krakow hurikhan77 at gmail.com
Sat Feb 10 14:05:16 UTC 2018


Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow:

> Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala:
> 
>> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote:
>>> This last log lines indicates journald wasn't scheduled for a long
>>> time which caused the watchdog to hit and journald was aborted.
>>> Consider increasing the watchdog timeout if your system is indeed that
>>> loaded and that's is supposed to be an OK thing...
>> 
>> BTW I've seen the same behavior on a system with a single active
>> process that uses enough memory to trigger significant swap use. I
>> wonder if there has been a regression in the kernel causing misbehavior
>> when swapping? The problems aren't specific to journald - desktop
>> environment can totally freeze too etc.
> 
> This problem seems to be there since kernel 4.9 which was a real pita in
> this regard. It's progressively becoming better since kernel 4.10. The
> kernel seems trying to prevent swapping at any cost since then, at least
> at the cost of much higher latency, and at the cost of pushing all cache
> out of RAM.
> 
> The result is processes stuck for easily 30 seconds and more during
> memory pressure. Sometimes I see the kernel loudly complaining in dmesg
> about high wait times for allocating RAM, especially from the btrfs
> module. Thus, the biggest problem may be that kernel threads itself get
> stuck in memory allocations and are a victim of high latency.
> 
> Currently I'm running my user session in a slice with max 80% RAM which
> seems to help. It helps not discarding all cache. I also put some
> potentially high memory users (regarding cache and/or resident mem) into
> slices with carefully selected memory limits (backup and maintenance
> services). Slices limited in such a way will start swapping before cache
> is discarded and everything works better again. Part of this problem may
> be that I have one process running which mmaps and locks 1G of memory
> (bees, a btrfs deduplicator).
> 
> This system has 16G of RAM which is usually plenty but I use tmpfs to
> build packages in Gentoo, and while that worked wonderfully before 4.9,
> I have to be really careful now. The kernel happily throws away cache
> instead of swapping early. Setting vm.swappiness differently seems to
> have no perceivable effect.
> 
> Software that uses mmap is the first latency victim of this new
> behavior.
> As such, also systemd-journald seems to be hit hard by this.
> 
> After the system recovered from high memory pressure (which can take
> 10-15 minutes, resulting in a loadavg of 400+), it ends up with some
> gigabytes of inactive memory in the swap which it will only swap back in
> then during shutdown (which will also take some minutes then).
> 
> The problem since 4.9 seems to be that the kernel tends to do swap
> storms instead of constantly swapping out memory at low rates during
> usage. The swap storms totally thrash the system.
> 
> Before 4.9, the kernel had no such latency spikes under memory pressure.
> Swap would usually grew slowly over time, and the system felt sluggish
> one or another time but still usable wrt latency. I usually ended up
> with 5-8G of swap usage, and that was no problem. Now, swap only
> significantly grows during swap storms with an unusable system for many
> minutes, with latencies of 10+ seconds around twice per minute.
> 
> I had no swap storm yet since the last boot, and swap usage is around
> 16M now. Before kernel 4.9, this would be much higher already.

After some more research, I found that vm.watermark_scale_factor may be 
the knob I am looking for. I'm going to watch behavior now with a higher 
factor (default = 10, now 200).


-- 
Regards,
Kai

Replies to list-only preferred.



More information about the systemd-devel mailing list