[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)

Wed May 21 14:43:12 UTC 2025

On Wed, May 21, 2025 at 12:23:58PM +1000, Dave Airlie wrote:
> >
> > So in the GPU case, you'd charge on allocation, free objects into a
> > cgroup-specific pool, and shrink using a cgroup-specific LRU
> > list. Freed objects can be reused by this cgroup, but nobody else.
> > They're reclaimed through memory pressure inside the cgroup, not due
> > to the action of others. And all allocated memory is accounted for.
> >
> > I have to admit I'm pretty clueless about the gpu driver internals and
> > can't really judge how feasible this is. But from a cgroup POV, if you
> > want proper memory isolation between groups, it seems to me that's the
> > direction you'd have to take this in.
> 
> I've been digging into this a bit today, to try and work out what
> various paths forward might look like and run into a few impedance
> mismatches.
> 
> 1. TTM doesn't pool objects, it pools pages. TTM objects are varied in
> size, we don't need to keep any sort of special allocator that we
> would need if we cached sized objects (size buckets etc). list_lru
> doesn't work on pages, if we were pooling the ttm objects I can see
> being able to enable list_lru. But I'm seeing increased complexity for
> no major return, but I might dig a bit more into whether caching
> objects might help.
> 
> 2. list_lru isn't suitable for pages, AFAICS we have to stick the page
> into another object to store it in the list_lru, which would mean we'd
> be allocating yet another wrapper object. Currently TTM uses the page
> LRU pointer to add it to the shrinker_list, which is simple and low
> overhead.

Why wouldn't you be able to use the page LRU list_head with list_lru?

list_lru_add(&ttm_pool_lru, &page->lru, page_to_nid(page), page_memcg(page));