[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)

Wed May 7 22:03:49 UTC 2025

On Thu, 8 May 2025 at 03:52, Johannes Weiner <hannes at cmpxchg.org> wrote:
>
> Hello Dave,
>
> On Fri, May 02, 2025 at 01:35:59PM +1000, Dave Airlie wrote:
> > Hey all,
> >
> > This is my second attempt at adding the initial simple memcg/ttm
> > integration.
> >
> > This varies from the first attempt in two major ways:
> >
> > 1. Instead of using __GFP_ACCOUNT and direct calling kmem charges
> > for pool memory, and directly hitting the GPU statistic, Waiman
> > suggested I just do what the network socket stuff did, which looks
> > simpler. So this adds two new memcg apis that wrap accounting.
> > The pages no longer get assigned the memcg, it's owned by the
> > larger BO object which makes more sense.
>
> Unfortunately, this was bad advice :(
>
> Naked-charging like this is quite awkward from the memcg side. It
> requires consumer-specific charge paths in the memcg code, adds stat
> counters that are memcg-only with no system-wide equivalent, and it's
> difficult for the memcg maintainers to keep track of the link between
> what's in the counter and the actual physical memory that is supposed
> to be tracked.
>
> The network and a few others like it are rather begrudging exceptions
> because they do not have a suitable page context or otherwise didn't
> fit the charging scheme. They're not good examples to follow if it can
> at all be avoided.

I unfortunately feel GPU might fit in this category as well, we aren't
allocating pages in the traditional manner, so we have a number of
problems with doing it at the page level.

I think the biggest barrier to doing page level tracking is with the
page pools we have to keep. When we need CPU uncached pages, we
allocate a bunch of pages and do the work to fix their cpu mappings to
be uncached, this work is costly, so when we no longer need the
uncached pages in an object, we return them to a pool rather than to
the central allocator. We then use a shrinker to empty the pool and
undo the cpu mapping change. We also do equivalent pools for dma able
and 32-bit dma able pages if they are used.

This means that if userspace allocates a large uncached object, and we
account it to the current memcg with __GFP_ACCOUNT, then when it gets
handed back to the pool we have to remove it from the memcg
accounting. This was where I used the memcg kmem charge/uncharges,
uncharge on adding to pool and charge on reuse. However kmem seems
like the wrong place to charge, but it's where the initial
__GFP_ACCOUNT will put those pages. This is why the suggestions to use
the network solution worked better for me, I can do all the work
outside the pool code at a slightly higher level (the same level where
we charge/uncharge dmem), and I don't have to worry about handling the
different pool types etc. We also don't need page->memcg to be set for
these pages I don't think as all pages are part of a large object and
the object is what gets swapped or freed, not parts of it.

>
> __GFP_ACCOUNT and an enum node_stat_item is the much preferred way. I
> have no objections to exports if you need to charge and account memory
> from a module.

Now maybe it makes sense to add a node_stat_item for this as well as
the GPU memcg accounting so we can have it accounted in both places,
right now mm has no insight statistics wise into this memory at all,
other than it being pages allocated.

Dave.