[PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
David Airlie
airlied at redhat.com
Tue Jun 24 01:12:56 UTC 2025
On Mon, Jun 23, 2025 at 6:54 PM Christian König
<christian.koenig at amd.com> wrote:
>
> On 6/19/25 09:20, Dave Airlie wrote:
> > From: Dave Airlie <airlied at redhat.com>
> >
> > While discussing memcg intergration with gpu memory allocations,
> > it was pointed out that there was no numa/system counters for
> > GPU memory allocations.
> >
> > With more integrated memory GPU server systems turning up, and
> > more requirements for memory tracking it seems we should start
> > closing the gap.
> >
> > Add two counters to track GPU per-node system memory allocations.
> >
> > The first is currently allocated to GPU objects, and the second
> > is for memory that is stored in GPU page pools that can be reclaimed,
> > by the shrinker.
> >
> > Cc: Christian Koenig <christian.koenig at amd.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Johannes Weiner <hannes at cmpxchg.org>
> > Cc: linux-mm at kvack.org
> > Cc: Andrew Morton <akpm at linux-foundation.org>
> > Signed-off-by: Dave Airlie <airlied at redhat.com>
> >
> > ---
> >
> > v2: add more info to the documentation on this memory.
> >
> > I'd like to get acks to merge this via the drm tree, if possible,
> >
> > Dave.
> > ---
> > Documentation/filesystems/proc.rst | 8 ++++++++
> > drivers/base/node.c | 5 +++++
> > fs/proc/meminfo.c | 6 ++++++
> > include/linux/mmzone.h | 2 ++
> > mm/show_mem.c | 9 +++++++--
> > mm/vmstat.c | 2 ++
> > 6 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 5236cb52e357..7cc5a9185190 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
> > CmaFree: 0 kB
> > Unaccepted: 0 kB
> > Balloon: 0 kB
> > + GPUActive: 0 kB
> > + GPUReclaim: 0 kB
>
> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
I'm not just concerned about newer platforms though, even on Fedora 42
on my test ryzen1+7900xt machine, with a desktop session running
nr_gpu_active 7473
nr_gpu_reclaim 6656
It's not an insignificant amount of memory. I also think if we get to
some sort of discardable GTT objects with a shrinker they should
probably be accounted in reclaim.
Dave.
More information about the dri-devel
mailing list