[PATCH v3 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap

Tue Jun 17 20:44:28 UTC 2025

On Tue, Jun 17, 2025 at 09:37:28PM +0200, Thomas Hellström wrote:
> On Tue, 2025-06-17 at 10:04 -0700, Matthew Brost wrote:
> > On Tue, Jun 17, 2025 at 04:55:26PM +0200, Thomas Hellström wrote:
> > > On Tue, 2025-06-17 at 20:17 +0530, Ghimiray, Himal Prasad wrote:
> > > > 
> > > > 
> > > > On 17-06-2025 18:41, Thomas Hellström wrote:
> > > > > On Tue, 2025-06-17 at 18:25 +0530, Ghimiray, Himal Prasad
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > On 13-06-2025 19:32, Thomas Hellström wrote:
> > > > > > > From: Matthew Brost <matthew.brost at intel.com>
> > > > > > > 
> > > > > > > The migration functionality and track-keeping of per-
> > > > > > > pagemap
> > > > > > > VRAM
> > > > > > > mapped to the CPU mm is not per GPU_vm, but rather per
> > > > > > > pagemap.
> > > > > > > This is also reflected by the functions not needing the
> > > > > > > drm_gpusvm
> > > > > > > structures. So move to drm_pagemap.
> > > > > > > 
> > > > > > > With this, drm_gpusvm shouldn't really access the page
> > > > > > > zone-
> > > > > > > device-
> > > > > > > data
> > > > > > > since its meaning is internal to drm_pagemap. Currently
> > > > > > > it's
> > > > > > > used
> > > > > > > to
> > > > > > > reject mapping ranges backed by multiple drm_pagemap
> > > > > > > allocations.
> > > > > > > For now, make the zone-device-data a void pointer.
> > > > > > > 
> > > > > > > Alter the interface of drm_gpusvm_migrate_to_devmem() to
> > > > > > > ensure
> > > > > > > we
> > > > > > > don't
> > > > > > > pass a gpusvm pointer.
> > > > > > > 
> > > > > > > Rename CONFIG_DRM_XE_DEVMEM_MIRROR to
> > > > > > > CONFIG_DRM_XE_PAGEMAP.
> > > > > > > 
> > > > > > > Matt is listed as author of this commit since he wrote most
> > > > > > > of
> > > > > > > the
> > > > > > > code,
> > > > > > > and it makes sense to retain his git authorship.
> > > > > > > Thomas mostly moved the code around.
> > > > > > 
> > > > > > > 
> > > > > > > v3:
> > > > > > > - Kerneldoc fixes (CI)
> > > > > > > - Don't update documentation about how the drm_pagemap
> > > > > > >     migration should be interpreted until upcoming
> > > > > > >     patches where the functionality is implemented.
> > > > > > >     (Matt Brost)
> > > > > > > 
> > > > > > > Co-developed-by: Thomas Hellström
> > > > > > > <thomas.hellstrom at linux.intel.com>
> > > > > > > Signed-off-by: Thomas Hellström
> > > > > > > <thomas.hellstrom at linux.intel.com>
> > > > > > > ---
> > > > > > >    Documentation/gpu/rfc/gpusvm.rst     |  12 +-
> > > > > > >    drivers/gpu/drm/Makefile             |   6 +-
> > > > > > >    drivers/gpu/drm/drm_gpusvm.c         | 759 +------------
> > > > > > > ----
> > > > > > > -----
> > > > > > > ----
> > > > > > >    drivers/gpu/drm/drm_pagemap.c        | 788
> > > > > > > +++++++++++++++++++++++++++
> > > > > > >    drivers/gpu/drm/xe/Kconfig           |  10 +-
> > > > > > >    drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
> > > > > > >    drivers/gpu/drm/xe/xe_device_types.h |   2 +-
> > > > > > >    drivers/gpu/drm/xe/xe_svm.c          |  47 +-
> > > > > > >    include/drm/drm_gpusvm.h             |  96 ----
> > > > > > >    include/drm/drm_pagemap.h            | 101 ++++
> > > > > > >    10 files changed, 950 insertions(+), 873 deletions(-)
> > > > > > >    create mode 100644 drivers/gpu/drm/drm_pagemap.c
> > > > > > > 
> > > > > > > diff --git a/Documentation/gpu/rfc/gpusvm.rst
> > > > > > > b/Documentation/gpu/rfc/gpusvm.rst
> > > > > > > index bcf66a8137a6..469db1372f16 100644
> > > > > > > --- a/Documentation/gpu/rfc/gpusvm.rst
> > > > > > > +++ b/Documentation/gpu/rfc/gpusvm.rst
> > > > > > > @@ -73,15 +73,21 @@ Overview of baseline design
> > > > > > >    .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> > > > > > >       :doc: Locking
> > > > > > >    
> > > > > > > -.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> > > > > > > -   :doc: Migration
> > > > > > > -
> > > > > > >    .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> > > > > > >       :doc: Partial Unmapping of Ranges
> > > > > > >    
> > > > > > >    .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> > > > > > >       :doc: Examples
> > > > > > >    
> > > > > > > +Overview of drm_pagemap design
> > > > > > > +==============================
> > > > > > > +
> > > > > > > +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
> > > > > > > +   :doc: Overview
> > > > > > > +
> > > > > > > +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
> > > > > > > +   :doc: Migration
> > > > > > > +
> > > > > > >    Possible future design features
> > > > > > >    ===============================
> > > > > > >    
> > > > > > > diff --git a/drivers/gpu/drm/Makefile
> > > > > > > b/drivers/gpu/drm/Makefile
> > > > > > > index 5050ac32bba2..4dafbdc8f86a 100644
> > > > > > > --- a/drivers/gpu/drm/Makefile
> > > > > > > +++ b/drivers/gpu/drm/Makefile
> > > > > > > @@ -104,7 +104,11 @@ obj-
> > > > > > > $(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS)
> > > > > > > +=
> > > > > > > drm_panel_backlight_quirks.o
> > > > > > >    #
> > > > > > >    obj-$(CONFIG_DRM_EXEC) += drm_exec.o
> > > > > > >    obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
> > > > > > > -obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o
> > > > > > > +
> > > > > > > +drm_gpusvm_helper-y := \
> > > > > > > +	drm_gpusvm.o\
> > > > > > > +	drm_pagemap.o
> > > > > > > +obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o
> > > > > > >    
> > > > > > >    obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
> > > > > > >    
> > > > > > > diff --git a/drivers/gpu/drm/drm_gpusvm.c
> > > > > > > b/drivers/gpu/drm/drm_gpusvm.c
> > > > > > > index 7ff81aa0a1ca..ef81381609de 100644
> > > > > > > --- a/drivers/gpu/drm/drm_gpusvm.c
> > > > > > > +++ b/drivers/gpu/drm/drm_gpusvm.c
> > > > > > > @@ -8,10 +8,9 @@
> > > > > > >    
> > > > > > >    #include <linux/dma-mapping.h>
> > > > > > >    #include <linux/hmm.h>
> > > > > > > +#include <linux/hugetlb_inline.h>
> > > > > > >    #include <linux/memremap.h>
> > > > > > > -#include <linux/migrate.h>
> > > > > > >    #include <linux/mm_types.h>
> > > > > > > -#include <linux/pagemap.h>
> > > > > > >    #include <linux/slab.h>
> > > > > > >    
> > > > > > >    #include <drm/drm_device.h>
> > > > > > > @@ -107,21 +106,6 @@
> > > > > > >     * to add annotations to GPU SVM.
> > > > > > >     */
> > > > > > >    
> > > > > > > -/**
> > > > > > > - * DOC: Migration
> > > > > > > - *
> > > > > > > - * The migration support is quite simple, allowing
> > > > > > > migration
> > > > > > > between RAM and
> > > > > > > - * device memory at the range granularity. For example,
> > > > > > > GPU
> > > > > > > SVM
> > > > > > > currently does
> > > > > > > - * not support mixing RAM and device memory pages within a
> > > > > > > range.
> > > > > > > This means
> > > > > > > - * that upon GPU fault, the entire range can be migrated
> > > > > > > to
> > > > > > > device
> > > > > > > memory, and
> > > > > > > - * upon CPU fault, the entire range is migrated to RAM.
> > > > > > > Mixed
> > > > > > > RAM
> > > > > > > and device
> > > > > > > - * memory storage within a range could be added in the
> > > > > > > future
> > > > > > > if
> > > > > > > required.
> > > > > > > - *
> > > > > > > - * The reasoning for only supporting range granularity is
> > > > > > > as
> > > > > > > follows: it
> > > > > > > - * simplifies the implementation, and range sizes are
> > > > > > > driver-
> > > > > > > defined and should
> > > > > > > - * be relatively small.
> > > > > > > - */
> > > > > > > -
> > > > > > >    /**
> > > > > > >     * DOC: Partial Unmapping of Ranges
> > > > > > >     *
> > > > > > > @@ -193,10 +177,9 @@
> > > > > > >     *		if (driver_migration_policy(range)) {
> > > > > > >     *			mmap_read_lock(mm);
> > > > > > >     *			devmem = driver_alloc_devmem();
> > > > > > > - *			err =
> > > > > > > drm_gpusvm_migrate_to_devmem(gpusvm,
> > > > > > > range,
> > > > > > > - *							
> > > > > > > devmem_allocation,
> > > > > > > - *							  
> > > > > > > &ctx);
> > > > > > > - *			mmap_read_unlock(mm);
> > > > > > > + *			err =
> > > > > > > drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm,
> > > > > > > gpuva_start,
> > > > > > > + *
> > > > > > > gpuva_end, driver_pgmap_owner());
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > fix doc passing timeslice as parameter.
> > > > > 
> > > > > Will fix.
> > > > > 
> > > > > > 
> > > > > 
> > > > > 8<-------------------------------------------------------------
> > > > > ----
> > > > > ----
> > > > > > > +/**
> > > > > > > + * drm_pagemap_migrate_to_devmem() - Migrate a struct
> > > > > > > mm_struct
> > > > > > > range to device memory
> > > > > > > + * @devmem_allocation: The device memory allocation to
> > > > > > > migrate
> > > > > > > to.
> > > > > > > + * The caller should hold a reference to the device memory
> > > > > > > allocation,
> > > > > > > + * and the reference is consumed by this function unless
> > > > > > > it
> > > > > > > returns with
> > > > > > > + * an error.
> > > > > > > + * @mm: Pointer to the struct mm_struct.
> > > > > > > + * @start: Start of the virtual address range to migrate.
> > > > > > > + * @end: End of the virtual address range to migrate.
> > > > > > > + * @timeslice_ms: The time requested for the migrated
> > > > > > > pages to
> > > > > > > + * be present in the cpu memory map before migrated back.
> > 
> > As Himal suggests and see below...
> > 
> > s/cpu/gpu
> > 
> > > > > > 
> > > > > > Shouldn't this be present in gpu or cpu memory map ? We are
> > > > > > using
> > > > > > this
> > > > > > to ensure pagefault can be handled effectively by ensuring
> > > > > > pages
> > > > > > remain
> > > > > > in vram here for prescribed time too.
> > > > > 
> > > > > So with this split, drm_pagemap is responsible for migrating
> > > > > memory
> > > > > and
> > > > > updating the CPU memory map only, whereas drm_gpusvm is
> > > > > responsible
> > > > > for
> > > > > setting up the GPU memory maps. So if it remains in the CPU
> > > > > memory
> > > > > map,
> > > > > then nothing will force the gpu vms to invalidate, unless the
> > > > > gpu
> > > > > driver decides to invalidate itself.
> > > > 
> > > > Thats true.
> > > > 
> > > > > 
> > > > > But looking at this i should probably rephrase "before migrated
> > > > > back"
> > > > > to "before being allowed to be migrated back".
> > > > 
> > > > The confusion for me is that timeslice_ms does not represent the
> > > > time
> > > > pages are required to stay in the CPU memory map, but rather the
> > > > time
> > > > they must remain in the GPU memory map. We defer migrate_to_smem
> > > > until 
> > > > this timeslice has expired.
> > 
> > Ideally, we'd want the timeslice to ensure CPU residency as well. The
> > issue arises because the timeslice is attached to the physical memory
> > (drm_gpusvm_devmem), which is not readily available on the GPU fault
> > handling side. UMD testing showed that ensuring CPU residency is not
> > required for atomics to work, so this part of the implementation was
> > left out. We can revisit and find a solution here if it turns out to
> > be
> > required.
> > 
> > For now, I'd fix the kernel doc above (s/cpu/gpu) with my suggestion
> > and
> > perhaps put in a comment why we don't wait on the GPU fault side.
> 
> > 
> > Matt
> 
> Hi, Matt, Please see my second response to Himal.
> 
> It might be that there is a confusion between "pages present in the cpu
> memory map" and "memory present in system memory"?
> 
> What I mean with "present in the cpu memory map" is that the pagemap
> pages (device private memory) is present in the struct mm_struct for at
> least xx ms. The struct mm is the authoritative information of which
> pages currently hold the data.
> 
> So "migrated pages are present in the cpu memory map" is therefore more
> or less equivalent to "data is present in device private memory".
> 

Ah, ok.

> So maybe I should rephrase this as
> "The time requested for the pagemap pages to be present in @mm"? And

Yea, that seems better.

Matt

> add a discussion like in Himal's reply on the typical use-case?
> 
> the pagemap itself has no control of how and if the gpu vms choose to
> expose this. It can only guarantee that the authoritative location of
> data is the device private memory for timeslice_ms.
> 
> /Thomas
> 
> 
> 
> > 
> > > 
> > > Yeah, although drm_pagemap is not aware of any gpu memory map so it
> > > would be incorrect to bring that up in the api docs. Could add some
> > > discussion, though, that "this could be used to..." and give the
> > > typical gpu use-case?
> > > 
> > > Thanks,
> > > Thomas
> > > 
>