[PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
David Airlie
airlied at redhat.com
Wed Jun 25 19:16:04 UTC 2025
On Wed, Jun 25, 2025 at 9:55 PM Christian König
<christian.koenig at amd.com> wrote:
>
> On 24.06.25 03:12, David Airlie wrote:
> > On Mon, Jun 23, 2025 at 6:54 PM Christian König
> > <christian.koenig at amd.com> wrote:
> >>
> >> On 6/19/25 09:20, Dave Airlie wrote:
> >>> From: Dave Airlie <airlied at redhat.com>
> >>>
> >>> While discussing memcg intergration with gpu memory allocations,
> >>> it was pointed out that there was no numa/system counters for
> >>> GPU memory allocations.
> >>>
> >>> With more integrated memory GPU server systems turning up, and
> >>> more requirements for memory tracking it seems we should start
> >>> closing the gap.
> >>>
> >>> Add two counters to track GPU per-node system memory allocations.
> >>>
> >>> The first is currently allocated to GPU objects, and the second
> >>> is for memory that is stored in GPU page pools that can be reclaimed,
> >>> by the shrinker.
> >>>
> >>> Cc: Christian Koenig <christian.koenig at amd.com>
> >>> Cc: Matthew Brost <matthew.brost at intel.com>
> >>> Cc: Johannes Weiner <hannes at cmpxchg.org>
> >>> Cc: linux-mm at kvack.org
> >>> Cc: Andrew Morton <akpm at linux-foundation.org>
> >>> Signed-off-by: Dave Airlie <airlied at redhat.com>
> >>>
> >>> ---
> >>>
> >>> v2: add more info to the documentation on this memory.
> >>>
> >>> I'd like to get acks to merge this via the drm tree, if possible,
> >>>
> >>> Dave.
> >>> ---
> >>> Documentation/filesystems/proc.rst | 8 ++++++++
> >>> drivers/base/node.c | 5 +++++
> >>> fs/proc/meminfo.c | 6 ++++++
> >>> include/linux/mmzone.h | 2 ++
> >>> mm/show_mem.c | 9 +++++++--
> >>> mm/vmstat.c | 2 ++
> >>> 6 files changed, 30 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> >>> index 5236cb52e357..7cc5a9185190 100644
> >>> --- a/Documentation/filesystems/proc.rst
> >>> +++ b/Documentation/filesystems/proc.rst
> >>> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
> >>> CmaFree: 0 kB
> >>> Unaccepted: 0 kB
> >>> Balloon: 0 kB
> >>> + GPUActive: 0 kB
> >>> + GPUReclaim: 0 kB
> >>
> >> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
> >
> > I'm not just concerned about newer platforms though, even on Fedora 42
> > on my test ryzen1+7900xt machine, with a desktop session running
> >
> > nr_gpu_active 7473
> > nr_gpu_reclaim 6656
> >
> > It's not an insignificant amount of memory.
>
> That was not what I meant, that you have quite a bunch of memory allocated to the GPU is correct.
>
> But the problem is more that we used the pool for way to many thinks which is actually not necessary.
>
> But granted this is orthogonal to that patch here.
At least here this is all WC allocations, probably from userspace, so
it feels like we are using it correctly, since we stopped pooling
cached pages.
>
> > I also think if we get to
> > some sort of discardable GTT objects with a shrinker they should
> > probably be accounted in reclaim.
>
> The problem is that this is extremely driver specific.
>
> On amdgpu we have some temporary buffers which can be reclaimed immediately, but the really big chunk is for example what XE does with it's shrinker.
>
> See Thomas TTM patches from a few month ago. If memory is active or reclaimable does not depend on how it is allocated, but on how it is used.
>
> So the accounting need to be at the driver level if you really want to distinct between the two states.
How the counters are used is fine to be done at the driver level on
top of this, though I think for discardable there is grounds for
ttm_tt having a discardable flag once we see a couple of drivers using
it, and then maybe the counters could be moved, but it's also fine to
use these counters in drivers outside TTM if they are done
appropriately, just so we can see the memory allocations as part of
the big picture.
Dave.
More information about the dri-devel
mailing list