[PATCH v5 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device
Thomas Hellström
thomas.hellstrom at linux.intel.com
Mon Jun 23 09:54:36 UTC 2025
Hi, Alistair,
On Thu, 2025-06-19 at 12:07 +0200, Thomas Hellström wrote:
>
> Hi, Alistair, Thanks for having a look!
That patch series is ready to be merged. Do you have any outstanding
concerns?
Thanks,
Thomas
>
> On Thu, 2025-06-19 at 14:52 +1000, Alistair Popple wrote:
> > On Wed, Jun 18, 2025 at 10:16:14PM +0200, Thomas Hellström wrote:
> > > This patchset modifies the migration part of drm_gpusvm to
> > > drm_pagemap and
> > > adds a populate_mm() op to drm_pagemap.
> > >
> > > The idea is that the device that receives a pagefault determines
> > > if
> > > it wants to
> > > migrate content and to where. It then calls the populate_mm()
> > > method of relevant
> > > drm_pagemap.
> > >
> > > This functionality was mostly already in place, but hard-coded
> > > for
> > > xe only without
> > > going through a pagemap op. Since we might be dealing with
> > > separate
> > > devices moving
> > > forward, it also now becomes the responsibilit of the
> > > populate_mm()
> > > op to
> > > grab any necessary local device runtime pm references and keep
> > > them
> > > held while
> > > its pages are present in an mm (struct mm_struct).
> > >
> > > On thing to decide here is whether the populate_mm() callback
> > > should sit on a
> > > struct drm_pagemap for now while we sort multi-device usability
> > > out
> > > or whether
> > > we should add it (or something equivalent) to struct dev_pagemap.
> >
> > I'm still looking at this series (sorry it took until v5 for me to
> > notice
> > it!) but my immediate reaction here is why do/would you need to add
> > anything
> > to struct dev_pagemap? The common approach here has been to embed
> > struct
> > dev_pagemap in some driver defined struct and use container_of to
> > go
> > from the
> > page to the driver (or in this case DRM) specific pagemap.
> >
> > See for example dmirror_page_to_chunk() in the HMM self test or
> > nouveau_page_to_chunk(). Is there some reason something like that
> > would work
> > here?
>
> Future patches will, as they are currently written, do something like
> this for embedding:
>
> struct xe_pagemap {
> struct dev_pagemap pagemap;
> struct drm_pagemap dpagemap;
> };
>
> So that the drm_pagemap can be obtained that way. The reason for that
> is to avoid diamond inheritance if the driver wanted to embed a
> pcie_p2p pagemap instead of a struct dev_pagemap. But if that becomes
> unlikely we could of course embed the dev_pagemap directly into the
> drm_pagemap.
>
>
> >
> > Actually I notice the Xe driver currently does use this to point to
> > a
> > struct
> > xe_vram_region which contains drm_pagemap pointer. If I understand
> > correctly
> > we're trying to move a lot of the SVM functionality into a generic
> > DRM layer,
> > so would it make more sense to have dev_pgmap embeded in drm_pgmap
> > and have that
> > contain the pointer to any required driver-specific data?
> >
> > Also FWIW I don't think zone_device_data is strictly required. It's
> > convenient,
> > but I suspect it only exists because it could be easily provided
> > within the
> > footprint of the existing struct page due to not using all the
> > fields
> > for
> > ZONE_DEVICE pages. I can imagine we might eventually remove it,
> > once
> > we no
> > longer need struct pages and move to folios/memdescs.
>
> It looks like, correct me if I'm wrong, like the nouveau version has
> one dev_pagemap per bo. The code at hand here use multiple bos for
> the
> memory of a single pagemap so in essence the zone_device_data
> provides
> a pointer to the bo used for that particular page. If we lose the
> zone_device_data there is probably other ways we can do that
> backwards
> lookup, but it may become nasty.
>
> So while I fully agree we should use some form of embedding of the
> dev_pagemap, and that could be the authoritative way to go from a
> page
> to a drm_pagemap, that is not really sufficient to go from a page to
> a
> bo.
>
> But more on this in upcoming multi-device patches. This series is
> mostly about separating the drm_gpusvm (GPU mapping) and
> drm_pagemap(migration) functionality that is already i place.
>
> But please let me know if there are any concerns with this.
>
> Thanks,
> Thomas
>
>
> >
> > > v2:
> > > - Rebase.
> > > v3:
> > > - Documentation updates (CI, Matt Brost)
> > > - Don't change TTM buffer object type for VRAM allocations (Matt
> > > Brost)
> > > v4:
> > > - Documentation Updates (Himal Ghimiray, Matt Brost)
> > > - Add an assert (Matt Brost)
> > > v5:
> > > - Rebase
> > > - Add R-Bs and SOBs.
> > >
> > > Matthew Brost (1):
> > > drm/gpusvm, drm/pagemap: Move migration functionality to
> > > drm_pagemap
> > >
> > > Thomas Hellström (2):
> > > drm/pagemap: Add a populate_mm op
> > > drm/xe: Implement and use the drm_pagemap populate_mm op
> > >
> > > Documentation/gpu/rfc/gpusvm.rst | 12 +-
> > > drivers/gpu/drm/Makefile | 6 +-
> > > drivers/gpu/drm/drm_gpusvm.c | 761 +--------------------
> > > --
> > > -
> > > drivers/gpu/drm/drm_pagemap.c | 838
> > > +++++++++++++++++++++++++++
> > > drivers/gpu/drm/xe/Kconfig | 10 +-
> > > drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
> > > drivers/gpu/drm/xe/xe_device_types.h | 2 +-
> > > drivers/gpu/drm/xe/xe_svm.c | 125 ++--
> > > drivers/gpu/drm/xe/xe_svm.h | 10 +-
> > > drivers/gpu/drm/xe/xe_tile.h | 11 +
> > > drivers/gpu/drm/xe/xe_vm.c | 2 +-
> > > include/drm/drm_gpusvm.h | 96 ---
> > > include/drm/drm_pagemap.h | 135 +++++
> > > 13 files changed, 1098 insertions(+), 912 deletions(-)
> > > create mode 100644 drivers/gpu/drm/drm_pagemap.c
> > >
> > > --
> > > 2.49.0
> > >
>
More information about the Intel-xe
mailing list