[RFC] Per file OOM-badness / RSS once more

Christian König ckoenig.leichtzumerken at gmail.com
Fri Jun 24 08:04:30 UTC 2022


Hello everyone,

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

I'm not explaining all the details again. See here for a more deeply
description of the problem: https://lwn.net/ml/linux-kernel/20220531100007.174649-1-christian.koenig@amd.com/

With this iteration I'm trying to address a bunch of the comments Michal Hocko
(thanks a lot for that) gave as well as giving some new ideas.

Changes made so far:
1. Renamed the callback into file_rss(). This is at least a start to better
   describe what this is all about. I've been going back and forth over the
   naming here, if you have any better idea please speak up.

2. Cleanups, e.g. now providing a helper function in the fs layer to sum up
   all the pages allocated by the files in a file descriptor table.

3. Using the actual number of allocated pages for the shmem implementation
   instead of just the size. I also tried to ignore shmem files which are part
   of tmpfs, cause that has a separate accounting/limitation approach.

4. The OOM killer now prints the memory of the killed process including the per
   file pages which makes the whole things much more comprehensible.

5. I've added the per file pages to the different reports in RSS in procfs.
   This has the interesting effect that tools like top suddenly give a much
   more accurate overview of the memory use as well. This of course increases
   the overhead of gathering those information quite a bit and I'm not sure how
   feasible that is for up-streaming. On the other hand this once more clearly
   shows that we need to do something about this issue.

Another rather interesting observation is that multiple subsystems (shmem,
tmpfs, ttm) came up with the same workaround of limiting the memory which can
be allocated through them to 50% of the whole system memory. Unfortunately
that isn't the same 50% and it doesn't apply everywhere, so you can still
easily crash the box.

Ideas and/or comments are really welcome.

Thanks,
Christian.




More information about the amd-gfx mailing list