[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration

Mon Apr 28 19:31:46 UTC 2025

On Mon, 28 Apr 2025 at 20:43, Christian König <christian.koenig at amd.com> wrote:
>
> On 4/23/25 23:37, Dave Airlie wrote:
> > Hey,
> >
> > I've been tasked to look into this, and I'm going start from hopeless
> > naivety and see how far I can get. This is an initial attempt to hook
> > TTM system memory allocations into memcg and account for them.
>
> Yeah, this looks mostly like what we had already discussed.
>
> >
> > It does:
> > 1. Adds memcg GPU statistic,
> > 2. Adds TTM memcg pointer for drivers to set on their user object
> > allocation paths
> > 3. Adds a singular path where we account for memory in TTM on cached
> > non-pooled non-dma allocations. Cached memory allocations used to be
> > pooled but we dropped that a while back which makes them the best target
> > to start attacking this from.
>
> I think that should go into the resource like the existing dmem approach instead. That allows drivers to control the accounting through the placement which is far less error prone than the context.

I'll reconsider this, but I'm not sure it'll work at that level,
because we have to handle the fact that when something gets put back
into the pool it gets removed from the cgroup kmem accounting and
taken from the pool gets added to the cgroup kmem account, but
otherwise we just use __GFP_ACCOUNT on allocations. I've added cached
pool support yesterday, which just leaves the dma paths which probably
aren't too insane, but I'll re-evaluate this and see if higher level
makes sense.

> > 4. It only accounts for memory that is allocated directly from a userspace
> > TTM operation (like page faults or validation). It *doesn't* account for
> > memory allocated in eviction paths due to device memory pressure.
>
> Yeah, that's something I totally agree on.
>
> But the major show stopper is still accounting to memcg will break existing userspace. E.g. display servers can get attacked with a deny of service with that.

The thing with modern userspace, I'm not sure this out of the box is a
major problem, we usually run the display server and the user
processes in the same cgroup, so they share limits. Most modern
distros don't run X.org servers as root in a separate cgroup, even
running X is usually in the same cgroup as the users of it, Android
might have different opinions of course, but I'd probably suggest we
Kconfig this stuff and let distros turn it on once we agree on a
baseline.

> >
> > This seems to work for me here on my hacked up tests systems at least, I
> > can see the GPU stats moving and they look sane.
> >
> > Future work:
> > Account for pooled non-cached
> > Account for pooled dma allocations (no idea how that looks)
> > Figure out if accounting for eviction is possible, and what it might look
> > like.
>
> T.J. suggested to account but don't limit the evictions and I think that should work.
>

I was going to introduce an gpu eviction stat counter as a start, I
also got the idea that might be a bit hard to pull off, but if a
process needs to evict from VRAM, but the original process has no
space in it's cgroup, we just fail the VRAM allocation for the current
process, which didn't sound insane, but I haven't considered how
implementing that in TTM might look.

Dave.