[PATCH 1/5] drm/xe/svm: Remap and provide memmap backing for GPU vram

Mon Mar 18 15:46:42 UTC 2024

On Mon, 2024-03-18 at 15:02 +0000, Zeng, Oak wrote:
> 
> 
> > -----Original Message-----
> > From: Hellstrom, Thomas <thomas.hellstrom at intel.com>
> > Sent: Monday, March 18, 2024 6:16 AM
> > To: Brost, Matthew <matthew.brost at intel.com>; Zeng, Oak
> > <oak.zeng at intel.com>
> > Cc: intel-xe at lists.freedesktop.org; Welty, Brian
> > <brian.welty at intel.com>;
> > airlied at gmail.com; Ghimiray, Himal Prasad
> > <himal.prasad.ghimiray at intel.com>
> > Subject: Re: [PATCH 1/5] drm/xe/svm: Remap and provide memmap
> > backing for
> > GPU vram
> > 
> > On Sat, 2024-03-16 at 01:25 +0000, Matthew Brost wrote:
> > > On Fri, Mar 15, 2024 at 03:31:24PM -0600, Zeng, Oak wrote:
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > > Sent: Friday, March 15, 2024 4:40 PM
> > > > > To: Zeng, Oak <oak.zeng at intel.com>
> > > > > Cc: intel-xe at lists.freedesktop.org; Hellstrom, Thomas
> > > > > <thomas.hellstrom at intel.com>; airlied at gmail.com; Welty, Brian
> > > > > <brian.welty at intel.com>; Ghimiray, Himal Prasad
> > > > > <himal.prasad.ghimiray at intel.com>
> > > > > Subject: Re: [PATCH 1/5] drm/xe/svm: Remap and provide memmap
> > > > > backing for
> > > > > GPU vram
> > > > > 
> > > > > On Fri, Mar 15, 2024 at 10:00:06AM -0600, Zeng, Oak wrote:
> > > > > > 
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > > > > Sent: Thursday, March 14, 2024 4:49 PM
> > > > > > > To: Zeng, Oak <oak.zeng at intel.com>
> > > > > > > Cc: intel-xe at lists.freedesktop.org; Hellstrom, Thomas
> > > > > > > <thomas.hellstrom at intel.com>; airlied at gmail.com; Welty,
> > > > > > > Brian
> > > > > > > <brian.welty at intel.com>; Ghimiray, Himal Prasad
> > > > > > > <himal.prasad.ghimiray at intel.com>
> > > > > > > Subject: Re: [PATCH 1/5] drm/xe/svm: Remap and provide
> > > > > > > memmap
> > > > > > > backing
> > > > > for
> > > > > > > GPU vram
> > > > > > > 
> > > > > > > On Thu, Mar 14, 2024 at 12:32:36PM -0600, Zeng, Oak
> > > > > > > wrote:
> > > > > > > > Hi Matt,
> > > > > > > > 
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > > > > > > Sent: Thursday, March 14, 2024 1:18 PM
> > > > > > > > > To: Zeng, Oak <oak.zeng at intel.com>
> > > > > > > > > Cc: intel-xe at lists.freedesktop.org; Hellstrom, Thomas
> > > > > > > > > <thomas.hellstrom at intel.com>; airlied at gmail.com;
> > > > > > > > > Welty,
> > > > > > > > > Brian
> > > > > > > > > <brian.welty at intel.com>; Ghimiray, Himal Prasad
> > > > > > > > > <himal.prasad.ghimiray at intel.com>
> > > > > > > > > Subject: Re: [PATCH 1/5] drm/xe/svm: Remap and
> > > > > > > > > provide
> > > > > > > > > memmap
> > > > > backing
> > > > > > > for
> > > > > > > > > GPU vram
> > > > > > > > > 
> > > > > > > > > On Wed, Mar 13, 2024 at 11:35:49PM -0400, Oak Zeng
> > > > > > > > > wrote:
> > > > > > > > > > Memory remap GPU vram using devm_memremap_pages, so
> > > > > > > > > > each
> > > > > GPU
> > > > > > > vram
> > > > > > > > > > page is backed by a struct page.
> > > > > > > > > > 
> > > > > > > > > > Those struct pages are created to allow hmm migrate
> > > > > > > > > > buffer b/t
> > > > > > > > > > GPU vram and CPU system memory using existing Linux
> > > > > > > > > > migration
> > > > > > > > > > mechanism (i.e., migrating b/t CPU system memory
> > > > > > > > > > and
> > > > > > > > > > hard disk).
> > > > > > > > > > 
> > > > > > > > > > This is prepare work to enable svm (shared virtual
> > > > > > > > > > memory) through
> > > > > > > > > > Linux kernel hmm framework. The memory remap's page
> > > > > > > > > > map
> > > > > > > > > > type is
> > > > > set
> > > > > > > > > > to MEMORY_DEVICE_PRIVATE for now. This means even
> > > > > > > > > > though each
> > > > > GPU
> > > > > > > > > > vram page get a struct page and can be mapped in
> > > > > > > > > > CPU
> > > > > > > > > > page table,
> > > > > > > > > > but such pages are treated as GPU's private
> > > > > > > > > > resource,
> > > > > > > > > > so CPU can't
> > > > > > > > > > access them. If CPU access such page, a page fault
> > > > > > > > > > is
> > > > > > > > > > triggered
> > > > > > > > > > and page will be migrate to system memory.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Is this really true? We can map VRAM BOs to the CPU
> > > > > > > > > without having
> > > > > > > > > migarte back and forth. Admittedly I don't know the
> > > > > > > > > inner
> > > > > > > > > workings of
> > > > > > > > > how this works but in IGTs we do this all the time.
> > > > > > > > > 
> > > > > > > > >   54         batch_bo = xe_bo_create(fd, vm,
> > > > > > > > > batch_size,
> > > > > > > > >   55                                
> > > > > > > > > vram_if_possible(fd,
> > > > > > > > > 0),
> > > > > > > > >   56
> > > > > DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM);
> > > > > > > > >   57         batch_map = xe_bo_map(fd, batch_bo,
> > > > > > > > > batch_size);
> > > > > > > > > 
> > > > > > > > > The BO is created in VRAM and then mapped to the CPU.
> > > > > > > > > 
> > > > > > > > > I don't think there is an expectation of coherence
> > > > > > > > > rather
> > > > > > > > > caching mode
> > > > > > > > > and exclusive access of the memory based on
> > > > > > > > > synchronization.
> > > > > > > > > 
> > > > > > > > > e.g.
> > > > > > > > > User write BB/data via CPU to GPU memory
> > > > > > > > > User calls exec
> > > > > > > > > GPU read / write memory
> > > > > > > > > User wait on sync indicating exec done
> > > > > > > > > User reads result
> > > > > > > > > 
> > > > > > > > > All of this works without migration. Are we not
> > > > > > > > > planing
> > > > > > > > > supporting flow
> > > > > > > > > with SVM?
> > > > > > > > > 
> > > > > > > > > Afaik this migration dance really only needs to be
> > > > > > > > > done
> > > > > > > > > if the CPU and
> > > > > > > > > GPU are using atomics on a shared memory region and
> > > > > > > > > the
> > > > > > > > > GPU device
> > > > > > > > > doesn't support a coherent memory protocol (e.g.
> > > > > > > > > PVC).
> > > > > > > > 
> > > > > > > > All you said is true. On many of our HW, CPU can
> > > > > > > > actually
> > > > > > > > access device
> > > > > memory,
> > > > > > > cache coherently or not.
> > > > > > > > 
> > > > > > > > The problem is, this is not true for all GPU vendors.
> > > > > > > > For
> > > > > > > > example, on some
> > > > > HW
> > > > > > > from some vendor, CPU can only access partially of device
> > > > > > > memory. The so
> > > > > called
> > > > > > > small bar concept.
> > > > > > > > 
> > > > > > > > So when HMM is defined, such factors were considered,
> > > > > > > > and
> > > > > > > MEMORY_DEVICE_PRIVATE is defined. With this definition,
> > > > > > > CPU
> > > > > > > can't access
> > > > > > > device memory.
> > > > > > > > 
> > > > > > > > So you can think it is a limitation of HMM.
> > > > > > > > 
> > > > > > > 
> > > > > > > Is it though? I see other type MEMORY_DEVICE_FS_DAX,
> > > > > > > MEMORY_DEVICE_GENERIC, and MEMORY_DEVICE_PCI_P2PDMA.
> > From my
> > > > > > > limited
> > > > > > > understanding it looks to me like one of those modes
> > > > > > > would
> > > > > > > support my
> > > > > > > example.
> > > > > > 
> > > > > > 
> > > > > > No, above are for other purposes. HMM only support
> > > > > > DEVICE_PRIVATE and
> > > > > DEVICE_COHERENT.
> > > > > > 
> > > > > > > 
> > > > > > > > Note this is only part 1 of our system allocator work.
> > > > > > > > We
> > > > > > > > do plan to support
> > > > > > > DEVICE_COHERENT for our newer device, see below. With
> > > > > > > this
> > > > > > > option, we
> > > > > don't
> > > > > > > have unnecessary migration back and forth.
> > > > > > > > 
> > > > > > > > You can think this is just work out all the code path.
> > > > > > > > 90%
> > > > > > > > of the driver code
> > > > > for
> > > > > > > DEVICE_PRIVATE and DEVICE_COHERENT will be same. Our real
> > > > > > > use
> > > > > > > of system
> > > > > > > allocator will be DEVICE_COHERENT mode. While
> > > > > > > DEVICE_PRIVATE
> > > > > > > mode
> > > > > allow us
> > > > > > > to exercise the code on old HW.
> > > > > > > > 
> > > > > > > > Make sense?
> > > > > > > > 
> > > > > > > 
> > > > > > > I guess if we want the system allocator to always
> > > > > > > coherent,
> > > > > > > then yes you
> > > > > > > need this dynamic migration with faulting on either side.
> > > > > > > 
> > > > > > > I was thinking the system allocator would be behave like
> > > > > > > my
> > > > > > > example
> > > > > > > above with madvise dictating the coherence rules.
> > > > > > > 
> > > > > > > Maybe I missed this in system allocator design but my
> > > > > > > feeling
> > > > > > > is we
> > > > > > > shouldn't arbitrarily enforce coherence as that could
> > > > > > > lead to
> > > > > > > poor
> > > > > > > performance due to constant migration.
> > > > > > 
> > > > > > System allocator itself doesn't enforce coherence.
> > > > > > Coherence is
> > > > > > built in user
> > > > > programming model. So system allocator allow both GPU and CPU
> > > > > access system
> > > > > allocated pointers, but it doesn't necessarily guarantee the
> > > > > data
> > > > > accessed from
> > > > > CPU/GPU is coherent. It is user program's responsibility to
> > > > > maintain data
> > > > > coherence.
> > > > > > 
> > > > > > Data migration in driver is optional, depending on platform
> > > > > > capability, user
> > > > > preference, correctness and performance consideration. Driver
> > > > > internal data
> > > > > migration of course shouldn't break data coherence.
> > > > > > 
> > > > > > Of course different vendor can have different data
> > > > > > coherence
> > > > > > scheme. For
> > > > > example, it is completely designer's flexibility to build
> > > > > model
> > > > > that is HW automatic
> > > > > data coherence or software explicit data coherence. On our
> > > > > platform, we allow
> > > > > user program to select different coherence mode by setting
> > > > > pat_index for gpu
> > > > > and cpu_caching mode for CPU. So we have completely give the
> > > > > flexibility to user
> > > > > program. Nothing of this contract is changed in system
> > > > > allocator
> > > > > design.
> > > > > > 
> > > > > > Going back to the question of what memory type we should
> > > > > > use to
> > > > > > register our
> > > > > vram to core mm. HMM currently support two types: PRIVATE and
> > > > > COHERENT.
> > > > > The coherent type requires some HW and BIOS support which we
> > > > > don't have
> > > > > right now. So the only available is PRIVATE. We have not
> > > > > other
> > > > > option right now.
> > > > > As said, we plan to support coherent type where we can avoid
> > > > > unnecessary data
> > > > > migration. But that is stage 2.
> > > > > > 
> > > > > 
> > > > > Thanks for the explaination. After reading your replies, the
> > > > > HMM
> > > > > doc,
> > > > > and looking at code this all makes sense.
> > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > For GPU device which supports coherent memory
> > > > > > > > > > protocol
> > > > > > > > > > b/t CPU and
> > > > > > > > > > GPU (such as CXL and CAPI protocol), we can remap
> > > > > > > > > > device memory as
> > > > > > > > > > MEMORY_DEVICE_COHERENT. This is TBD.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > > > > > > > > > Co-developed-by: Niranjana Vishwanathapura
> > > > > > > > > <niranjana.vishwanathapura at intel.com>
> > > > > > > > > > Signed-off-by: Niranjana Vishwanathapura
> > > > > > > > > <niranjana.vishwanathapura at intel.com>
> > > > > > > > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > > > > > > > Cc: Thomas Hellström <thomas.hellstrom at intel.com>
> > > > > > > > > > Cc: Brian Welty <brian.welty at intel.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/gpu/drm/xe/Makefile          |  3 +-
> > > > > > > > > >  drivers/gpu/drm/xe/xe_device_types.h |  9 +++
> > > > > > > > > >  drivers/gpu/drm/xe/xe_mmio.c         |  8 +++
> > > > > > > > > >  drivers/gpu/drm/xe/xe_svm.h          | 14 +++++
> > > > > > > > > >  drivers/gpu/drm/xe/xe_svm_devmem.c   | 91
> > > > > > > > > ++++++++++++++++++++++++++++
> > > > > > > > > >  5 files changed, 124 insertions(+), 1 deletion(-)
> > > > > > > > > >  create mode 100644 drivers/gpu/drm/xe/xe_svm.h
> > > > > > > > > >  create mode 100644
> > > > > > > > > > drivers/gpu/drm/xe/xe_svm_devmem.c
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > > b/drivers/gpu/drm/xe/Makefile
> > > > > > > > > > index c531210695db..840467080e59 100644
> > > > > > > > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > > > > > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > > > > > > > @@ -142,7 +142,8 @@ xe-y += xe_bb.o \
> > > > > > > > > >  	xe_vram_freq.o \
> > > > > > > > > >  	xe_wait_user_fence.o \
> > > > > > > > > >  	xe_wa.o \
> > > > > > > > > > -	xe_wopcm.o
> > > > > > > > > > +	xe_wopcm.o \
> > > > > > > > > > +	xe_svm_devmem.o
> > > > > > > > > 
> > > > > > > > > These should be in alphabetical order.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >  # graphics hardware monitoring (HWMON) support
> > > > > > > > > >  xe-$(CONFIG_HWMON) += xe_hwmon.o
> > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > > > index 9785eef2e5a4..f27c3bee8ce7 100644
> > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > > > @@ -99,6 +99,15 @@ struct xe_mem_region {
> > > > > > > > > >  	resource_size_t actual_physical_size;
> > > > > > > > > >  	/** @mapping: pointer to VRAM mappable
> > > > > > > > > > space
> > > > > > > > > > */
> > > > > > > > > >  	void __iomem *mapping;
> > > > > > > > > > +	/** @pagemap: Used to remap device memory
> > > > > > > > > > as
> > > > > > > > > > ZONE_DEVICE
> > > > > */
> > > > > > > > > > +    struct dev_pagemap pagemap;
> > > > > > > > > > +    /**
> > > > > > > > > > +     * @hpa_base: base host physical address
> > > > > > > > > > +     *
> > > > > > > > > > +     * This is generated when remap device memory
> > > > > > > > > > as
> > > > > > > > > > ZONE_DEVICE
> > > > > > > > > > +     */
> > > > > > > > > > +    resource_size_t hpa_base;
> > > > > > > > > 
> > > > > > > > > Weird indentation. This goes for the entire series,
> > > > > > > > > look
> > > > > > > > > at checkpatch.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > > +
> > > > > > > > > >  };
> > > > > > > > > > 
> > > > > > > > > >  /**
> > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c
> > > > > > > b/drivers/gpu/drm/xe/xe_mmio.c
> > > > > > > > > > index e3db3a178760..0d795394bc4c 100644
> > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > > > > > > > > @@ -22,6 +22,7 @@
> > > > > > > > > >  #include "xe_module.h"
> > > > > > > > > >  #include "xe_sriov.h"
> > > > > > > > > >  #include "xe_tile.h"
> > > > > > > > > > +#include "xe_svm.h"
> > > > > > > > > > 
> > > > > > > > > >  #define
> > > > > > > > > > XEHP_MTCFG_ADDR		XE_REG(0x101800)
> > > > > > > > > >  #define TILE_COUNT		REG_GENMASK(15, 8)
> > > > > > > > > > @@ -286,6 +287,7 @@ int xe_mmio_probe_vram(struct
> > > > > > > > > > xe_device *xe)
> > > > > > > > > >  		}
> > > > > > > > > > 
> > > > > > > > > >  		io_size -= min_t(u64, tile_size,
> > > > > > > > > > io_size);
> > > > > > > > > > +		xe_svm_devm_add(tile, &tile-
> > > > > > > > > > > mem.vram);
> > > > > > > > > 
> > > > > > > > > Do we want to do this probe for all devices with VRAM
> > > > > > > > > or
> > > > > > > > > only a subset?
> > > > > > > > 
> > > > > > > > All
> > > > > > > 
> > > > > > > Can you explain why?
> > > > > > 
> > > > > > It is natural for me to add all device memory to hmm. In
> > > > > > hmm
> > > > > > design, device
> > > > > memory is used as a special swap out for system memory. I
> > > > > would
> > > > > ask why we
> > > > > only want to add a subset of vram? By a subset, do you mean
> > > > > only
> > > > > vram of one
> > > > > tile instead of all tiles?
> > > > > > 
> > > > > 
> > > > > I think we talking about different things, my bad on wording
> > > > > in
> > > > > the
> > > > > original question.
> > > > > 
> > > > > Let me ask again - should be calling xe_svm_devm_add on all
> > > > > *platforms*
> > > > > that have VRAM. i.e. Should we do this on PVC but not DG2?
> > > > 
> > > > 
> > > > Oh, I see. Good question. On i915, this feature was only tested
> > > > on
> > > > PVC. We don't have a plan to enable it on older platform than
> > > > PVC.
> > > > 
> > > > Let me add a check here, only enabled it on platform newer than
> > > > PVC
> > > > 
> > > 
> > > Probably actually check 'xe->info.has_usm'.
> > > 
> > > We might want to rename field too and drop the 'usm' nomenclature
> > > but
> > > that can be done later.
> > 
> > Perhaps "has_recoverable_pagefaults" or some for of abbreviation.
> 
> USM has two flavors: a driver allocator and a system allocator. 
> 
> Both of those two flavors depend on recoverable pagefault. And for
> our current implementation, they only depend on recoverable page
> fault HW feature. In the future, we might have other implementation
> that depends on more HW features, such as ATS (address translation
> service).
> 
> So at least for now, "has_usm" and "has_recoverable_pagefaults" are
> pretty much the same thing. I will keep the has_usm for now. Open to
> a change in the future also.
> 
> > 
> > Another question w r t this is whether we should do this
> > unconditionally even on platforms that support it. Adding a
> > struct_page
> > per VRAM page would potentially consume a significant amount of
> > system
> > memory.
> 
> Yes, I was thinking of add a kernel configuration option so this
> feature can be enable/disabled at compilation time. Do you want me to
> add one?

Yes I think that is a good idea.

We also might want to add per device enabling in sysfs, or
setup on first use and disable if unused at shrinking time.

But IMO let's start with a kernel configuration option for now.

/Thomas

> 
> Oak 
> > 
> > /Thomas
> > 
> > 
> > 
> > > 
> > > Matt
> > > 
> > > > Oak
> > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > Oak
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > >  	}
> > > > > > > > > > 
> > > > > > > > > >  	xe->mem.vram.actual_physical_size =
> > > > > > > > > > total_size;
> > > > > > > > > > @@ -354,10 +356,16 @@ void
> > > > > > > > > > xe_mmio_probe_tiles(struct
> > > > > > > > > > xe_device
> > > > > *xe)
> > > > > > > > > >  static void mmio_fini(struct drm_device *drm, void
> > > > > > > > > > *arg)
> > > > > > > > > >  {
> > > > > > > > > >  	struct xe_device *xe = arg;
> > > > > > > > > > +    struct xe_tile *tile;
> > > > > > > > > > +    u8 id;
> > > > > > > > > > 
> > > > > > > > > >  	pci_iounmap(to_pci_dev(xe->drm.dev), xe-
> > > > > > > > > > > mmio.regs);
> > > > > > > > > >  	if (xe->mem.vram.mapping)
> > > > > > > > > >  		iounmap(xe->mem.vram.mapping);
> > > > > > > > > > +
> > > > > > > > > > +	for_each_tile(tile, xe, id)
> > > > > > > > > > +		xe_svm_devm_remove(xe, &tile-
> > > > > > > > > > > mem.vram);
> > > > > > > > > 
> > > > > > > > > This should probably be above existing code. Typical
> > > > > > > > > on
> > > > > > > > > fini to do
> > > > > > > > > things in reverse order from init.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > > +
> > > > > > > > > >  }
> > > > > > > > > > 
> > > > > > > > > >  static int xe_verify_lmem_ready(struct xe_device
> > > > > > > > > > *xe)
> > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_svm.h
> > > > > b/drivers/gpu/drm/xe/xe_svm.h
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 000000000000..09f9afb0e7d4
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > > > > > > > > > @@ -0,0 +1,14 @@
> > > > > > > > > > +// SPDX-License-Identifier: MIT
> > > > > > > > > > +/*
> > > > > > > > > > + * Copyright © 2023 Intel Corporation
> > > > > > > > > 
> > > > > > > > > 2024?
> > > > > > > > 
> > > > > > > > This patch was actually written 2023
> > > > > > > > > 
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +#ifndef __XE_SVM_H
> > > > > > > > > > +#define __XE_SVM_H
> > > > > > > > > > +
> > > > > > > > > > +#include "xe_device_types.h"
> > > > > > > > > 
> > > > > > > > > I don't think you need to include this. Rather just
> > > > > > > > > forward decl structs
> > > > > > > > > used here.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > e.g.
> > > > > > > > > 
> > > > > > > > > struct xe_device;
> > > > > > > > > struct xe_mem_region;
> > > > > > > > > struct xe_tile;
> > > > > > > > > 
> > > > > > > > > > +
> > > > > > > > > > +int xe_svm_devm_add(struct xe_tile *tile, struct
> > > > > > > > > > xe_mem_region
> > > > > *mem);
> > > > > > > > > > +void xe_svm_devm_remove(struct xe_device *xe,
> > > > > > > > > > struct
> > > > > xe_mem_region
> > > > > > > > > *mem);
> > > > > > > > > 
> > > > > > > > > The arguments here are incongruent here. Typically we
> > > > > > > > > want these to
> > > > > > > > > match.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > > +
> > > > > > > > > > +#endif
> > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_svm_devmem.c
> > > > > > > > > 
> > > > > > > > > Incongruent between xe_svm.h and xe_svm_devmem.c.
> > > > > > > > 
> > > > > > > > Did you mean mem vs mr? if yes, will fix
> > > > > > > > 
> > > > > > > > Again these two
> > > > > > > > > should
> > > > > > > > > match.
> > > > > > > > > 
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 000000000000..63b7a1961cc6
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
> > > > > > > > > > @@ -0,0 +1,91 @@
> > > > > > > > > > +// SPDX-License-Identifier: MIT
> > > > > > > > > > +/*
> > > > > > > > > > + * Copyright © 2023 Intel Corporation
> > > > > > > > > 
> > > > > > > > > 2024?
> > > > > > > > It is from 2023
> > > > > > > > > 
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +#include <linux/mm_types.h>
> > > > > > > > > > +#include <linux/sched/mm.h>
> > > > > > > > > > +
> > > > > > > > > > +#include "xe_device_types.h"
> > > > > > > > > > +#include "xe_trace.h"
> > > > > > > > > 
> > > > > > > > > xe_trace.h appears to be unused.
> > > > > > > > 
> > > > > > > > Will fix
> > > > > > > > > 
> > > > > > > > > > +#include "xe_svm.h"
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +static vm_fault_t xe_devm_migrate_to_ram(struct
> > > > > > > > > > vm_fault *vmf)
> > > > > > > > > > +{
> > > > > > > > > > +	return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void xe_devm_page_free(struct page *page)
> > > > > > > > > > +{
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static const struct dev_pagemap_ops
> > > > > > > > > > xe_devm_pagemap_ops = {
> > > > > > > > > > +	.page_free = xe_devm_page_free,
> > > > > > > > > > +	.migrate_to_ram = xe_devm_migrate_to_ram,
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > 
> > > > > > > > > Assume these are placeholders that will be populated
> > > > > > > > > later?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > corrrect
> > > > > > > > > 
> > > > > > > > > > +/**
> > > > > > > > > > + * xe_svm_devm_add: Remap and provide memmap
> > > > > > > > > > backing
> > > > > > > > > > for
> > > > > device
> > > > > > > > > memory
> > > > > > > > > > + * @tile: tile that the memory region blongs to
> > > > > > > > > > + * @mr: memory region to remap
> > > > > > > > > > + *
> > > > > > > > > > + * This remap device memory to host physical
> > > > > > > > > > address
> > > > > > > > > > space and create
> > > > > > > > > > + * struct page to back device memory
> > > > > > > > > > + *
> > > > > > > > > > + * Return: 0 on success standard error code
> > > > > > > > > > otherwise
> > > > > > > > > > + */
> > > > > > > > > > +int xe_svm_devm_add(struct xe_tile *tile, struct
> > > > > > > > > > xe_mem_region *mr)
> > > > > > > > > 
> > > > > > > > > Here I see the xe_mem_region is from tile->mem.vram,
> > > > > > > > > wondering rather
> > > > > > > > > than using the tile->mem.vram we should use xe-
> > > > > > > > > >mem.vram
> > > > > > > > > when
> > > > > enabling
> > > > > > > > > svm? Isn't the idea behind svm the entire memory is 1
> > > > > > > > > view?
> > > > > > > > 
> > > > > > > > Still need to use tile. The reason is, memory of
> > > > > > > > different
> > > > > > > > tile can have
> > > > > different
> > > > > > > characteristics, such as latency. So we want to
> > > > > > > differentiate
> > > > > > > memory b/t tiles
> > > > > also
> > > > > > > in svm. I need to change below " mr->pagemap.owner =
> > > > > > > tile-
> > > > > > > > xe->drm.dev ".
> > > > > the
> > > > > > > owner should also be tile. This is the way hmm
> > > > > > > differentiate
> > > > > > > memory of
> > > > > different
> > > > > > > tile.
> > > > > > > > 
> > > > > > > > With svm it is 1 view, from virtual address space
> > > > > > > > perspective and from
> > > > > physical
> > > > > > > struct page perspective. You can think of all the tile's
> > > > > > > vram
> > > > > > > is stacked together
> > > > > to
> > > > > > > form a unified view together with system memory. This
> > > > > > > doesn't
> > > > > > > prohibit us
> > > > > from
> > > > > > > differentiate memory from different tile. This
> > > > > > > differentiation allow us to
> > > > > optimize
> > > > > > > performance, i.e., we can wisely place memory in specific
> > > > > > > tile. If we don't
> > > > > > > differentiate, this is not possible.
> > > > > > > > 
> > > > > > > 
> > > > > > > Ok makes sense.
> > > > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > I suppose if we do that we also only use 1 TTM VRAM
> > > > > > > > > manager / buddy
> > > > > > > > > allocator too. I thought I saw some patches flying
> > > > > > > > > around
> > > > > > > > > for that too.
> > > > > > > > 
> > > > > > > > Ttm vram manager is not in the picture. We deliberately
> > > > > > > > avoided it per
> > > > > previous
> > > > > > > discussion
> > > > > > > > 
> > > > > > > > Yes same buddy allocator. It is in my previous POC:
> > > > > https://lore.kernel.org/dri-
> > > > > > > devel/20240117221223.18540-12-oak.zeng at intel.com/. I
> > > > > > > didn't
> > > > > > > put those
> > > > > patches
> > > > > > > in this series because I want to merge this small patches
> > > > > > > separately.
> > > > > > > > > 
> > > > > > > > > > +{
> > > > > > > > > > +	struct device *dev = &to_pci_dev(tile->xe-
> > > > > > > > > > > drm.dev)->dev;
> > > > > > > > > > +	struct resource *res;
> > > > > > > > > > +	void *addr;
> > > > > > > > > > +	int ret;
> > > > > > > > > > +
> > > > > > > > > > +	res = devm_request_free_mem_region(dev,
> > > > > > > > > > &iomem_resource,
> > > > > > > > > > +					   mr-
> > > > > > > > > > > usable_size);
> > > > > > > > > > +	if (IS_ERR(res)) {
> > > > > > > > > > +		ret = PTR_ERR(res);
> > > > > > > > > > +		return ret;
> > > > > > > > > > +	}
> > > > > > > > > > +
> > > > > > > > > > +	mr->pagemap.type = MEMORY_DEVICE_PRIVATE;
> > > > > > > > > > +	mr->pagemap.range.start = res->start;
> > > > > > > > > > +	mr->pagemap.range.end = res->end;
> > > > > > > > > > +	mr->pagemap.nr_range = 1;
> > > > > > > > > > +	mr->pagemap.ops = &xe_devm_pagemap_ops;
> > > > > > > > > > +	mr->pagemap.owner = tile->xe->drm.dev;
> > > > > > > > > > +	addr = devm_memremap_pages(dev, &mr-
> > > > > > > > > > >pagemap);
> > > > > > > > > > +	if (IS_ERR(addr)) {
> > > > > > > > > > +		devm_release_mem_region(dev, res-
> > > > > > > > > > > start,
> > > > > resource_size(res));
> > > > > > > > > > +		ret = PTR_ERR(addr);
> > > > > > > > > > +		drm_err(&tile->xe->drm, "Failed to
> > > > > > > > > > remap tile %d
> > > > > memory,
> > > > > > > > > errno %d\n",
> > > > > > > > > > +				tile->id, ret);
> > > > > > > > > > +		return ret;
> > > > > > > > > > +	}
> > > > > > > > > > +	mr->hpa_base = res->start;
> > > > > > > > > > +
> > > > > > > > > > +	drm_info(&tile->xe->drm, "Added tile %d
> > > > > > > > > > memory
> > > > > > > > > > [%llx-%llx] to
> > > > > devm,
> > > > > > > > > remapped to %pr\n",
> > > > > > > > > > +			tile->id, mr->io_start,
> > > > > > > > > > mr-
> > > > > > > > > > > io_start + mr-
> > > > > > usable_size,
> > > > > > > > > res);
> > > > > > > > > > +	return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/**
> > > > > > > > > > + * xe_svm_devm_remove: Unmap device memory and
> > > > > > > > > > free
> > > > > > > > > > resources
> > > > > > > > > > + * @xe: xe device
> > > > > > > > > > + * @mr: memory region to remove
> > > > > > > > > > + */
> > > > > > > > > > +void xe_svm_devm_remove(struct xe_device *xe,
> > > > > > > > > > struct
> > > > > xe_mem_region
> > > > > > > > > *mr)
> > > > > > > > > > +{
> > > > > > > > > > +	/*FIXME: below cause a kernel hange during
> > > > > > > > > > moduel remove*/
> > > > > > > > > > +#if 0
> > > > > > > > > > +	struct device *dev = &to_pci_dev(xe-
> > > > > > > > > > >drm.dev)-
> > > > > > > > > > > dev;
> > > > > > > > > > +
> > > > > > > > > > +	if (mr->hpa_base) {
> > > > > > > > > > +		devm_memunmap_pages(dev, &mr-
> > > > > > > > > > > pagemap);
> > > > > > > > > > +		devm_release_mem_region(dev, mr-
> > > > > > pagemap.range.start,
> > > > > > > > > > +			mr->pagemap.range.end -
> > > > > > > > > > mr-
> > > > > > pagemap.range.start +1);
> > > > > > > > > > +	}
> > > > > > > > > > +#endif
> > > > > > > > > 
> > > > > > > > > This would need to be fixed too.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Yes...
> > > > > > > > 
> > > > > > > > Oak
> > > > > > > > > 
> > > > > > > > > Matt
> > > > > > > > > 
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > --
> > > > > > > > > > 2.26.3
> > > > > > > > > > 
>