[PATCH 1/5] drm/xe/svm: Remap and provide memmap backing for GPU vram

Zeng, Oak oak.zeng at intel.com
Fri Mar 15 03:16:10 UTC 2024



> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Zeng, Oak
> Sent: Thursday, March 14, 2024 11:10 PM
> To: Welty, Brian <brian.welty at intel.com>; intel-xe at lists.freedesktop.org
> Cc: Hellstrom, Thomas <thomas.hellstrom at intel.com>; Brost, Matthew
> <matthew.brost at intel.com>; airlied at gmail.com; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray at intel.com>
> Subject: RE: [PATCH 1/5] drm/xe/svm: Remap and provide memmap backing for
> GPU vram
> 
> 
> 
> > -----Original Message-----
> > From: Welty, Brian <brian.welty at intel.com>
> > Sent: Thursday, March 14, 2024 9:46 PM
> > To: Zeng, Oak <oak.zeng at intel.com>; intel-xe at lists.freedesktop.org
> > Cc: Hellstrom, Thomas <thomas.hellstrom at intel.com>; Brost, Matthew
> > <matthew.brost at intel.com>; airlied at gmail.com; Ghimiray, Himal Prasad
> > <himal.prasad.ghimiray at intel.com>
> > Subject: Re: [PATCH 1/5] drm/xe/svm: Remap and provide memmap backing
> for
> > GPU vram
> >
> > Hi Oak,
> >
> > On 3/13/2024 8:35 PM, Oak Zeng wrote:
> > > Memory remap GPU vram using devm_memremap_pages, so each GPU
> vram
> > > page is backed by a struct page.
> > >
> > > Those struct pages are created to allow hmm migrate buffer b/t
> > > GPU vram and CPU system memory using existing Linux migration
> > > mechanism (i.e., migrating b/t CPU system memory and hard disk).
> > >
> > > This is prepare work to enable svm (shared virtual memory) through
> > > Linux kernel hmm framework. The memory remap's page map type is set
> > > to MEMORY_DEVICE_PRIVATE for now. This means even though each GPU
> > > vram page get a struct page and can be mapped in CPU page table,
> > > but such pages are treated as GPU's private resource, so CPU can't
> > > access them. If CPU access such page, a page fault is triggered
> > > and page will be migrate to system memory.
> > >
> > > For GPU device which supports coherent memory protocol b/t CPU and
> > > GPU (such as CXL and CAPI protocol), we can remap device memory as
> > > MEMORY_DEVICE_COHERENT. This is TBD.
> > >
> > > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > > Co-developed-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura at intel.com>
> > > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura at intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom at intel.com>
> > > Cc: Brian Welty <brian.welty at intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/Makefile          |  3 +-
> > >   drivers/gpu/drm/xe/xe_device_types.h |  9 +++
> > >   drivers/gpu/drm/xe/xe_mmio.c         |  8 +++
> > >   drivers/gpu/drm/xe/xe_svm.h          | 14 +++++
> > >   drivers/gpu/drm/xe/xe_svm_devmem.c   | 91
> > ++++++++++++++++++++++++++++
> > >   5 files changed, 124 insertions(+), 1 deletion(-)
> > >   create mode 100644 drivers/gpu/drm/xe/xe_svm.h
> > >   create mode 100644 drivers/gpu/drm/xe/xe_svm_devmem.c
> > >
> > > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > > index c531210695db..840467080e59 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -142,7 +142,8 @@ xe-y += xe_bb.o \
> > >   	xe_vram_freq.o \
> > >   	xe_wait_user_fence.o \
> > >   	xe_wa.o \
> > > -	xe_wopcm.o
> > > +	xe_wopcm.o \
> > > +	xe_svm_devmem.o
> >
> > Minor, but maintainers want above alphabetically sorted.
> 
> Correct. Matt pointed out the same. Will fix
> >
> > >
> > >   # graphics hardware monitoring (HWMON) support
> > >   xe-$(CONFIG_HWMON) += xe_hwmon.o
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > > index 9785eef2e5a4..f27c3bee8ce7 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -99,6 +99,15 @@ struct xe_mem_region {
> > >   	resource_size_t actual_physical_size;
> > >   	/** @mapping: pointer to VRAM mappable space */
> > >   	void __iomem *mapping;
> > > +	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
> > > +    struct dev_pagemap pagemap;
> > > +    /**
> > > +     * @hpa_base: base host physical address
> > > +     *
> > > +     * This is generated when remap device memory as ZONE_DEVICE
> > > +     */
> > > +    resource_size_t hpa_base;
> > > +
> > >   };
> > >
> > >   /**
> > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c
> b/drivers/gpu/drm/xe/xe_mmio.c
> > > index e3db3a178760..0d795394bc4c 100644
> > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > @@ -22,6 +22,7 @@
> > >   #include "xe_module.h"
> > >   #include "xe_sriov.h"
> > >   #include "xe_tile.h"
> > > +#include "xe_svm.h"
> > >
> > >   #define XEHP_MTCFG_ADDR		XE_REG(0x101800)
> > >   #define TILE_COUNT		REG_GENMASK(15, 8)
> > > @@ -286,6 +287,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
> > >   		}
> > >
> > >   		io_size -= min_t(u64, tile_size, io_size);
> > > +		xe_svm_devm_add(tile, &tile->mem.vram);
> >
> > I think slightly more appropriate call site for this might be
> > xe_tile_init_noalloc(), as that function states it is preparing tile
> > for VRAM allocations.
> > Also, I mention because we might like the flexiblity in future to call
> > this once for xe_device.mem.vram, instead of calling for each tile,
> > and easier to handle that in xe_tile.c instead of xe_mmio.c.
> 
> Good point. Will follow.

Sorry, with my comment below, do you still want to call it in xe_tile_init_noalloc?

For UMA, we only need to call it once. If you do it in init-noalloc, you would call it multiple times. Right?

Oak

> >
> > Related comment below.
> >
> >
> > >   	}
> > >
> > >   	xe->mem.vram.actual_physical_size = total_size;
> > > @@ -354,10 +356,16 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
> > >   static void mmio_fini(struct drm_device *drm, void *arg)
> > >   {
> > >   	struct xe_device *xe = arg;
> > > +    struct xe_tile *tile;
> > > +    u8 id;
> > >
> > >   	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
> > >   	if (xe->mem.vram.mapping)
> > >   		iounmap(xe->mem.vram.mapping);
> > > +
> > > +	for_each_tile(tile, xe, id)
> > > +		xe_svm_devm_remove(xe, &tile->mem.vram);
> > > +
> > >   }
> > >
> > >   static int xe_verify_lmem_ready(struct xe_device *xe)
> > > diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> > > new file mode 100644
> > > index 000000000000..09f9afb0e7d4
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > > @@ -0,0 +1,14 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2023 Intel Corporation
> > > + */
> > > +
> > > +#ifndef __XE_SVM_H
> > > +#define __XE_SVM_H
> > > +
> > > +#include "xe_device_types.h"
> > > +
> > > +int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
> > > +void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region
> > *mem);
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c
> > b/drivers/gpu/drm/xe/xe_svm_devmem.c
> > > new file mode 100644
> > > index 000000000000..63b7a1961cc6
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
> > > @@ -0,0 +1,91 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2023 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/mm_types.h>
> > > +#include <linux/sched/mm.h>
> > > +
> > > +#include "xe_device_types.h"
> > > +#include "xe_trace.h"
> > > +#include "xe_svm.h"
> > > +
> > > +
> > > +static vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static void xe_devm_page_free(struct page *page)
> > > +{
> > > +}
> > > +
> > > +static const struct dev_pagemap_ops xe_devm_pagemap_ops = {
> > > +	.page_free = xe_devm_page_free,
> > > +	.migrate_to_ram = xe_devm_migrate_to_ram,
> > > +};
> > > +
> > > +/**
> > > + * xe_svm_devm_add: Remap and provide memmap backing for device
> > memory
> >
> > Do we really need 'svm' in function name?
> 
> Good point. I will remove svm.
> >
> > > + * @tile: tile that the memory region blongs to
> >
> > We might like flexibility in future to call this once for
> > xe_device.mem.vram, instead of calling for each tile.
> > So can we remove the tile argument, and just pass the xe_device pointer
> > and tile->id ?
> 
> This is interesting.
> 
> First of all, I programmed wrong below: mr->pagemap.owner = tile->xe-
> >drm.dev;
> 
> This should be: mr->pagemap.owner = tile for NUMA vram system.
> 
> For UMA vram, this should be: mr->pagemap.owner = tile_to_xe(tile);
> 
> This owner is important. It is used later to decide migration by hmm. We need to
> set the owner for hmm to identify different vram region.
> 
> Based on above, I think the tile parameter is better. For UMA, caller need to
> make sure call it once, any tile pointer should work. This does sound a little weird.
> But I don’t have a better idea.
> 
> Oak
> >
> >
> > > + * @mr: memory region to remap
> > > + *
> > > + * This remap device memory to host physical address space and create
> > > + * struct page to back device memory
> > > + *
> > > + * Return: 0 on success standard error code otherwise
> > > + */
> > > +int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mr)
> > > +{
> > > +	struct device *dev = &to_pci_dev(tile->xe->drm.dev)->dev;
> > > +	struct resource *res;
> > > +	void *addr;
> > > +	int ret;
> > > +
> > > +	res = devm_request_free_mem_region(dev, &iomem_resource,
> > > +					   mr->usable_size);
> > > +	if (IS_ERR(res)) {
> > > +		ret = PTR_ERR(res);
> > > +		return ret;
> > > +	}
> > > +
> > > +	mr->pagemap.type = MEMORY_DEVICE_PRIVATE;
> > > +	mr->pagemap.range.start = res->start;
> > > +	mr->pagemap.range.end = res->end;
> > > +	mr->pagemap.nr_range = 1;
> > > +	mr->pagemap.ops = &xe_devm_pagemap_ops;
> > > +	mr->pagemap.owner = tile->xe->drm.dev;
> > > +	addr = devm_memremap_pages(dev, &mr->pagemap);
> > > +	if (IS_ERR(addr)) {
> > > +		devm_release_mem_region(dev, res->start, resource_size(res));
> > > +		ret = PTR_ERR(addr);
> > > +		drm_err(&tile->xe->drm, "Failed to remap tile %d memory,
> > errno %d\n",
> > > +				tile->id, ret);
> > > +		return ret;
> > > +	}
> > > +	mr->hpa_base = res->start;
> > > +
> > > +	drm_info(&tile->xe->drm, "Added tile %d memory [%llx-%llx] to devm,
> > remapped to %pr\n",
> > > +			tile->id, mr->io_start, mr->io_start + mr->usable_size,
> > res);
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * xe_svm_devm_remove: Unmap device memory and free resources
> > > + * @xe: xe device
> > > + * @mr: memory region to remove
> > > + */
> > > +void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region
> > *mr)
> > > +{
> > > +	/*FIXME: below cause a kernel hange during moduel remove*/
> > > +#if 0
> > > +	struct device *dev = &to_pci_dev(xe->drm.dev)->dev;
> > > +
> > > +	if (mr->hpa_base) {
> > > +		devm_memunmap_pages(dev, &mr->pagemap);
> > > +		devm_release_mem_region(dev, mr->pagemap.range.start,
> > > +			mr->pagemap.range.end - mr->pagemap.range.start +1);
> > > +	}
> > > +#endif
> > > +}
> > > +


More information about the Intel-xe mailing list