[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)

Thu May 15 03:02:07 UTC 2025

> I have to admit I'm pretty clueless about the gpu driver internals and
> can't really judge how feasible this is. But from a cgroup POV, if you
> want proper memory isolation between groups, it seems to me that's the
> direction you'd have to take this in.

Thanks for this insight, I think you have definitely shown me where
things need to go here, and I agree that the goal should be to make
the pools and the shrinker memcg aware is the proper answer,
unfortunately I think we are long way from that at the moment, but
I'll need to do a bit more research. I wonder if we can agree on some
compromise points in order to move things forward from where they are
now.

Right now we have 0 accounting for any system memory allocations done
via GPU APIs, never mind the case where we have pools and evictions.

I think I sort of see 3 stages:
1. Land some sort of accounting so you can at least see the active GPU
memory usage globally, per-node and per-cgroup - this series mostly
covers that, modulo any other feedback I get.
2. Work on making the ttm subsystem cgroup aware and achieve the state
where we can shrink inside the cgroup first.
3. Work on what to do with evicted memory for VRAM allocations, and
how best to integrate with dmem to possibly allow userspace to define
policy for this.

> Ah, no need to worry about it. The name is just a historical memcgism,
> from back when we first started charging "kernel" allocations, as
> opposed to the conventional, pageable userspace memory. It's no longer
> a super meaningful distinction, tbh.
>
> You can still add a separate counter for GPU memory.

Okay that's interesting, so I guess the only question vs the bespoke
ones is whether we use __GFP_ACCOUNT and whether there is benefit in
having page->memcg set.

>
> I agree this doesn't need to be a goal in itself. It would just be a
> side effect of charging through __GFP_ACCOUNT and uncharging inside
> __free_pages(). What's more important is that the charge lifetime is
> correlated with the actual memory allocation.

How much flexibility to do we have to evolve here, like if we start
with where the latest series I posted gets us (maybe with a CONFIG
option), then work on memcg aware shrinkers for the pools, then with
that in place it might make more sense to account across the complete
memory allocation. I think I'm also not sure if passing __GFP_ACCOUNT
to the dma allocators is supported, which is also something we need to
do, and having the bespoke API allows that to be possible.

Dave.