[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)

Fri May 16 16:41:50 UTC 2025

On Fri, May 16, 2025 at 05:35:12PM +0200, Christian König wrote:
> On 5/16/25 16:53, Johannes Weiner wrote:
> > On Fri, May 16, 2025 at 08:53:07AM +0200, Christian König wrote:
> >> The cgroup who originally allocated it has no reference to the
> >> memory any more and also no way of giving it back to the core
> >> system.
> > 
> > Of course it does, the shrinker LRU.
> 
> No it doesn't. The LRU handling here is global and not per cgroup.

Well, the discussion at hand is that it should be.

> > Listen, none of this is even remotely new. This isn't the first cache
> > we're tracking, and it's not the first consumer that can outlive the
> > controlling cgroup.
> 
> Yes, I knew about all of that and I find that extremely questionable
> on existing handling as well.

This code handles billions of containers every day, but we'll be sure
to consult you on the next redesign.

> Memory pools which are only used to improve allocation performance
> are something the kernel handles transparently and are completely
> outside of any cgroup tracking whatsoever.

You're describing a cache. It doesn't matter whether it's caching CPU
work, IO work or network packets.

What matters is what it takes to recycle those pages for other
purposes - especially non-GPU purposes.

And more importantly, *what other memory in other cgroups they
displace in the meantime*.

It's really not that difficult to see an isolation issue here.

Anyway, it doesn't look like there is a lot of value in continuing
this conversation, so I'm going to check out of this subthread.