Per file OOM badness
alexdeucher at gmail.com
Tue May 31 22:00:51 UTC 2022
On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
> Hello everyone,
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
> fd = memfd_create("test", 0);
> while (1)
> write(fd, page, 4096);
> Compile this and you can bring down any standard desktop system within
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
> Those problems where brought up on the mailing list multiple times now
> , but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
>  https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>  https://lkml.org/lkml/2018/1/18/543
>  https://lkml.org/lkml/2021/2/4/799
More information about the dri-devel