[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration

Mon Apr 28 16:00:24 UTC 2025

On Mon, Apr 28, 2025 at 12:43:30PM +0200, Christian König wrote:
> On 4/23/25 23:37, Dave Airlie wrote:
> > Hey,
> > 
> > I've been tasked to look into this, and I'm going start from hopeless
> > naivety and see how far I can get. This is an initial attempt to hook
> > TTM system memory allocations into memcg and account for them.
> 
> Yeah, this looks mostly like what we had already discussed.
> 
> > 
> > It does:
> > 1. Adds memcg GPU statistic,
> > 2. Adds TTM memcg pointer for drivers to set on their user object
> > allocation paths
> > 3. Adds a singular path where we account for memory in TTM on cached
> > non-pooled non-dma allocations. Cached memory allocations used to be
> > pooled but we dropped that a while back which makes them the best target
> > to start attacking this from.
> 
> I think that should go into the resource like the existing dmem approach
> instead. That allows drivers to control the accounting through the
> placement which is far less error prone than the context.
> 
> It would also completely avoid the pooled vs unpooled problematic.
> 
> 
> > 4. It only accounts for memory that is allocated directly from a userspace
> > TTM operation (like page faults or validation). It *doesn't* account for
> > memory allocated in eviction paths due to device memory pressure.
> 
> Yeah, that's something I totally agree on.
> 
> But the major show stopper is still accounting to memcg will break
> existing userspace. E.g. display servers can get attacked with a deny of
> service with that.
> 
> The feature would need to be behind a module option or not account
> allocations for DRM masters or something like that.

The trouble is that support is very uneven, and it will be even more
uneven going forward. Especially if we then also add in SoC drivers, which
have all kinds of fun between system memory, cma, carveout and userptr all
being differently accounted for.

Which means I think we need two pieces here:

1. opt-in enabling, or things break

2. some way to figure out whether what userspace expects in term of
enforcement matches what the kernel actually does

Without two we'll never manage to get this beyond the initial demo stage I
fear, and we'll have a really hard time rolling out the various pieces to
various drivers.

But I have no idea what this should look like at all unfortuantely. Best I
can come up with is a set of flags of what kind of enforcement the kernel
does, and every time we add something new we set a new flag. And if the
flags userspace or the modoption opt-in sets don't match what the kernel
supports, you get a fallback to no enforcment.

But a module option flag approach doesn't cover at all per-driver or
per-device changes. I think for that we need the kernel to provide enough
information to userspace in sysfs, which userspace then needs to use to
set/update cgroup limits to fit whatever use-case it case. Or maybe a
per-device opt-in flag set.

Also I think the only fallback we can realistically provide is "no
enforcement", and that would technically be a regression every time we add
a new enforcement feature and hence another opt-in flag. And see below
with just the eviction example, I think there's plenty of really tricky
areas where we will just never get to the end state in one step, because
it's too much work and too hard to get right from the first attempt.

I think once we have a decent opt-in/forward-compatible strategy for
cgroups gpu features, adding not-entirely-complete solutions to get this
moving is the right thing to do.

> > This seems to work for me here on my hacked up tests systems at least, I
> > can see the GPU stats moving and they look sane.
> > 
> > Future work:
> > Account for pooled non-cached
> > Account for pooled dma allocations (no idea how that looks)
> > Figure out if accounting for eviction is possible, and what it might look
> > like.
> 
> T.J. suggested to account but don't limit the evictions and I think that
> should work.

I think this will need a ladder of implementations, where we slowly get to
a full featured place. Maybe something like:

1. Don't account evicted buffers. Pretty obvious gap if you're on a dgpu,
but entirely fine with an igpu without stolen memory.

2. Account, but don't enforce any limits on evictions. This could already
get funny if then system memory allocations start failing for random
reasons due to memory pressure from other processes.

3. Probably at this point we need a memcg aware shrinker in ttm drivers
that want to go further.

4. Start enforcing limits even on eviction.

I probably missed a few steps, like about enforcing dmem limits. And
memory pin limits also tie into this all in interesting ways (both for
system and device memory).

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch