[systemd-devel] Making /run respect Container Memory Limits

Lennart Poettering lennart at poettering.net
Mon Sep 23 14:14:58 UTC 2024


On Mo, 23.09.24 11:58, Matthew Ife (matthew at ife.onl) wrote:

> > /run/ is only mounted by systemd if it is not pre-mounted already by
> > the container manager. We generally assume the container manager does
> > that (for example systemd-nspawn does it that way), already because
> > /run/host/ is the mechanism to pass outside info/resources into the
> > container in a systemd world, hence it really needs to be premounted.
>
> I think theabove is enough to know the right answer.
> Fix the container manager to behave correctly. This feels like the most elegant approach.
>
> I didn't spot this when trying to understand the best approach to change things. Apologies.
>
> Note, you're right about how we do stupid things like disabling swap. Its not my call sadly!
> Whilst I dont think the answer here is "adding swap will fix" there are a myriad other reasons to
> have swap and it would at least elongate the cliff-edge we have with this problem otherwise.

Adding swap *will* fix the issue for you btw to a large degree.

By not having swap you make it impossible for tmpfs and anonymous
memory to be paged out. You basically *create* an artificial OOM
situation if any loads shows up, because you artifically minimize the
amount of reclaimable pages: in most cases only mapped ELF binaries
become reclaimable this way, so they will be constantly thrashed and
everything goes to shit.

If you disable swap on a big server you are just misunderstanding how
memory management works on Linux, and its pretty much your own
fault. This might sound harsh, but it is how it is.

Talk to whoever maintains these systems, and get them talk to some MM
person and get educated about these things. There's a fundamental
misunderstanding here how loaded systems need to be managed.

And if you then combine this with non-persistant journald, you are
artificially amplifying the problem you artificially created for
yourself, because you intentionally moved even more stuff that would
normally be backed by disk into unreclaimable memory.

Lennart

--
Lennart Poettering, Berlin


More information about the systemd-devel mailing list