drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver

Tue Jul 1 23:26:07 UTC 2025

On 6/30/25 14:49, Dave Airlie wrote:
> Hi all,
> 
> tl;dr: start using list_lru/numa/memcg in GPU driver core and amdgpu driver for now.
> 
> This is a complete series of patches, some of which have been sent before and reviewed,
> but I want to get the complete picture for others, and try to figure out how best to land this.
> 
> There are 3 pieces to this:
> 01->02: add support for global gpu stat counters (previously posted, patch 2 is newer)
> 03->07: port ttm pools to list_lru for numa awareness
> 08->14: add memcg stats + gpu apis, then port ttm pools to memcg aware list_lru and shrinker
> 15->17: enable amdgpu to use new functionality.
> 
> The biggest difference in the memcg code from previously is I discovered what
> obj cgroups were designed for and I'm reusing the page/objcg intergration that 
> already exists, to avoid reinventing that wheel right now.
> 
> There are some igt-gpu-tools tests I've written at:
> https://gitlab.freedesktop.org/airlied/igt-gpu-tools/-/tree/amdgpu-cgroups?ref_type=heads
> 
> One problem is there are a lot of delayed action, that probably means the testing
> needs a bit more robustness, but the tests validate all the basic paths.
> 

Hi, Dave

memcg is designed to use memory (rss and page cache) as a single entity in a way that users
don't need to worry about the distinction between memory types and need to think about their
overall memory utilization or discover it with ability to overcommit via swap as needed.

How does dmem fit into the picture? Is the cgroup integration designed to overcommit or limit
dmem/both? Is the programmer expected to know how much dmem the program will need? May be this
was answered, but I missed it.

Balbir Singh