[RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup

Mon May 10 16:06:31 UTC 2021

Hi,

Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
> 
...
> > If DRM allows user-space to exhaust all of system memory, this seems
> > to be a gap in enforcement of MEMCG limits for system memory.
> > I tried to look into it when this was discussed in the past....
> > My guess is that shmem_read_mapping_page_gfp() ->
> > shmem_getpage_gfp() is not choosing the correct MM to charge against
> > in the use case of drivers using shmemfs for backing gem buffers.
> 
> Yeah we know about this one since forever. The bug report is roughly
> as old as the gem/ttm memory managers :-/ So another problem might be
> that if we now suddenly include gpu memory in the memcg accounting, we
> start breaking a bunch of workloads that worked just fine beforehand.

It's not the first time tightening security requires adapting settings
for running workloads...

Workload GPU memory usage needs to be significant portion of
application's real memory usage, to cause workload to hit limits that
have been set for it earlier.  Therefore I think it to definitely be
something that user setting such limits actually cares about.

=> I think the important thing is that reason for the failures is clear
from the OOM message.  This works much better if GPU related memory
usage is specifically stated in that message, once that memory starts to
be accounted for system memory.

	- Eero