[PATCH 12/17] ttm: add objcg pointer to bo and tt

Wed Jul 2 07:57:35 UTC 2025

> >
> > It makes it easier now, but when we have to solve swapping, step one
> > will be moving all this code around to what I have now, and starting
> > from there.
> >
> > This just raises the bar to solving the next problem.
> >
> > We need to find incremental approaches to getting all the pieces of
> > the puzzle solved, or else we will still be here in 10 years.
> >
> > The steps I've formulated (none of them are perfect, but they all seem
> > better than status quo)
> >
> > 1. add global counters for pages - now we can at least see things in
> > vmstat and per-node
> > 2. add numa to the pool lru - we can remove our own numa code and
> > align with core kernel - probably doesn't help anything
>
> So far no objections from my side to that.
>
> > 3. add memcg awareness to the pool and pool shrinker.
> >     if you are on a APU with no swap configured - you have a lot better time.
> >     if you are on a dGPU or APU with swap - you have a moderately
> > better time, but I can't see you having a worse time.
>
> Well that's what I'm strongly disagreeing on.
>
> Adding memcg to the pool has no value at all and complicates things massively when moving forward.
>
> What exactly should be the benefit of that?

I'm already showing the benefit of the pool moving to memcg, we've
even talked about it multiple times on the list, it's not a OMG change
the world benefit, but it definitely provides better alignment between
the pool and memcg allocations.

We expose userspace API to allocate write combined memory, we do this
for all currently supported CPU/GPUs. We might think in the future we
don't want to continue to do this, but we do it now. My Fedora 42
desktop uses it, even if you say there is no need.

If I allocate 100% of my memcg budget to WC memory, free it, then
allocate 100% of my budget to non-WC memory, we break container
containment as we can force other cgroups to run out of memory budget
and have to call the global shrinker. With this in place, the
container that allocates the WC memory also pays the price to switch
it back. Again this is just correctness, it's not going to fix any
major workloads, but I also don't think it should cause any
regressions, since it won't be worse than current worst case
expectation for most workloads.

I'm not just adding memcg awareness to the pool though, that is just
completeness, I'm adding memcg awareness to all GPU system memory
allocations, and making sure swapout works (which it does), swapin
probably needs more work.

The future work is integerating ttm swap mechanisms with memcg to get it right.
> >
> > Accounting at the resource level makes stuff better, but I don't
> > believe after implementing it that it is consistent with solving the
> > overall problem.
>
> Exactly that's my point. See accounting is no problem at all, that can be done on any possible level.
>
> What is tricky is shrinking, e.g. either core MM or memcg asking to reduce the usage of memory and moving things into swap.
>
> And that can only be done either on the resource level or the tt object, but not the pool level.

I understand we have to add more code to the tt level and that's fine,
I just don't see why you think starting at the bottom level is wrong?
it clearly has a use, and it's just cleaning up and preparing the
levels, so we can move up and solve the next problem.

> The whole TTM pool is to aid a 28 year old HW design which has no practical relevance on modern systems and we should really not touch that in any way possible.

Modern systems are still using it, I'm still seeing WC allocations,
they still seem to have some cost associated with them on x86-64, they
certainly aren't free. I don't care if they aren't practical, but if
they are a way to route around container containment, they need to be
fixed.

Dave.