[RFC PATCH] drm/panfrost: Add support for mapping BOs on GPU page faults

Thu Jun 27 11:19:45 UTC 2019

On 10/06/2019 18:04, Rob Herring wrote:
> The midgard/bifrost GPUs need to allocate GPU memory which is allocated
> on GPU page faults and not pinned in memory. The vendor driver calls
> this functionality GROW_ON_GPF.
> 
> This implementation assumes that BOs allocated with the
> PANFROST_BO_NOMAP flag are never mmapped or exported. Both of those may
> actually work, but I'm unsure if there's some interaction there. It
> would cause the whole object to be pinned in memory which would defeat
> the point of this.
> 
> Issues/questions/thoughts:
> 
> What's the difference between i_mapping and f_mapping?
> 
> What kind of clean-up on close is needed? Based on vgem faults, there
> doesn't seem to be any refcounting. Assume userspace is responsible for
> not freeing the BO while a page fault can occur?
> 
> What about evictions? Need to call mapping_set_unevictable()? Maybe we
> want these pages to be swappable, but then we need some notification to
> unmap them.
> 
> No need for runtime PM, because faults should only occur during job
> submit?

Although normally faults should only occur when a job is running I
believe in some situations it is possible for the job to have completed
while there is a page fault outstanding (I think this can occur when the
job fails at the same time as the page fault happens). In this situation
the GPU might be considered idle when the MMU irq is being handled.

> Need to fault in 2MB sections when possible. It seems this has to be
> done by getting an array of struct pages and then converting to a
> scatterlist. Not sure if there is a better way to do that. There's also
> some errata on T604 needing to fault in at least 8 pages at a time which
> isn't handled.

kbase simply doesn't support a page-by-page mapping of memory. There is
a region at the beginning of a buffer which is mapped and after that the
buffer is unmapped. It is then quite simple to round up the region to
map to the next 2MB section/8-page boundary. It's also a good
performance improvement because page faults are relatively expensive and
you might need to grow the buffer by a large size for the job to
complete (e.g. on the first few frames). There aren't many use-cases for
having a buffer with multiple holes in - although Vulkan does have a
"sparse" mechanism which might require something similar (but unlikely
to require faulting in).

> Cc: Robin Murphy <robin.murphy at arm.com>
> Cc: Steven Price <steven.price at arm.com>
> Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>
> Cc: Tomeu Vizoso <tomeu at tomeuvizoso.net>
> Signed-off-by: Rob Herring <robh at kernel.org>
> ---
> 
>  drivers/gpu/drm/panfrost/TODO           |  2 -
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 22 +++++--
>  drivers/gpu/drm/panfrost/panfrost_gem.c |  3 +-
>  drivers/gpu/drm/panfrost/panfrost_gem.h |  8 +++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 76 +++++++++++++++++++++++--
>  include/uapi/drm/panfrost_drm.h         |  2 +
>  6 files changed, 100 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
> index c2e44add37d8..64129bf73933 100644
> --- a/drivers/gpu/drm/panfrost/TODO
> +++ b/drivers/gpu/drm/panfrost/TODO
> @@ -14,8 +14,6 @@
>    The hard part is handling when more address spaces are needed than what
>    the h/w provides.
>  
> -- Support pinning pages on demand (GPU page faults).
> -
>  - Support userspace controlled GPU virtual addresses. Needed for Vulkan. (Tomeu)
>  
>  - Support for madvise and a shrinker.
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index d11e2281dde6..f08d41d5186a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -44,9 +44,11 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
>  {
>  	int ret;
>  	struct drm_gem_shmem_object *shmem;
> +	struct panfrost_gem_object *bo;
>  	struct drm_panfrost_create_bo *args = data;
>  
> -	if (!args->size || args->flags || args->pad)
> +	if (!args->size || args->pad ||
> +	    (args->flags & ~PANFROST_BO_NOMAP))
>  		return -EINVAL;
>  
>  	shmem = drm_gem_shmem_create_with_handle(file, dev, args->size,
> @@ -54,11 +56,16 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
>  	if (IS_ERR(shmem))
>  		return PTR_ERR(shmem);
>  
> -	ret = panfrost_mmu_map(to_panfrost_bo(&shmem->base));
> -	if (ret)
> -		goto err_free;
> +	bo = to_panfrost_bo(&shmem->base);
>  
> -	args->offset = to_panfrost_bo(&shmem->base)->node.start << PAGE_SHIFT;
> +	if (!(args->flags & PANFROST_BO_NOMAP)) {
> +		ret = panfrost_mmu_map(bo);
> +		if (ret)
> +			goto err_free;
> +	} else
> +		bo->no_map = 1;
> +
> +	args->offset = bo->node.start << PAGE_SHIFT;
>  
>  	return 0;
>  
> @@ -269,6 +276,11 @@ static int panfrost_ioctl_mmap_bo(struct drm_device *dev, void *data,
>  		return -ENOENT;
>  	}
>  
> +	if (to_panfrost_bo(gem_obj)->no_map) {
> +		DRM_DEBUG("Can't mmap 'no_map' GEM BO %d\n", args->handle);
> +		return -EINVAL;
> +	}
> +
>  	ret = drm_gem_create_mmap_offset(gem_obj);
>  	if (ret == 0)
>  		args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 886875ae31d3..ed0b4f79610a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -19,7 +19,8 @@ static void panfrost_gem_free_object(struct drm_gem_object *obj)
>  	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>  	struct panfrost_device *pfdev = obj->dev->dev_private;
>  
> -	panfrost_mmu_unmap(bo);
> +	if (!bo->no_map)
> +		panfrost_mmu_unmap(bo);
>  
>  	spin_lock(&pfdev->mm_lock);
>  	drm_mm_remove_node(&bo->node);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index 045000eb5fcf..aa210d7cbf3f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -11,6 +11,7 @@ struct panfrost_gem_object {
>  	struct drm_gem_shmem_object base;
>  
>  	struct drm_mm_node node;
> +	unsigned no_map: 1;
>  };
>  
>  static inline
> @@ -19,6 +20,13 @@ struct  panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj)
>  	return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base);
>  }
>  
> +static inline
> +struct  panfrost_gem_object *drm_mm_node_to_panfrost_bo(struct drm_mm_node *node)
> +{
> +	return container_of(node, struct panfrost_gem_object, node);
> +}
> +
> +
>  struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size);
>  
>  struct drm_gem_object *
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 41d184fab07f..5a36495a38f3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -3,6 +3,7 @@
>  /* Copyright (C) 2019 Arm Ltd. */
>  #include <linux/bitfield.h>
>  #include <linux/delay.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/iopoll.h>
> @@ -10,6 +11,7 @@
>  #include <linux/iommu.h>
>  #include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/shmem_fs.h>
>  #include <linux/sizes.h>
>  
>  #include "panfrost_device.h"
> @@ -277,6 +279,60 @@ static const struct iommu_gather_ops mmu_tlb_ops = {
>  	.tlb_sync	= mmu_tlb_sync_context,
>  };
>  
> +static struct drm_mm_node *addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
> +{
> +	struct drm_mm_node *node;
> +	u64 offset = addr >> PAGE_SHIFT;
> +
> +	drm_mm_for_each_node(node, &pfdev->mm) {
> +		if (offset >= node->start && offset < (node->start + node->size))
> +			return node;
> +	}
> +	return NULL;
> +}
> +
> +int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
> +{
> +	struct page *page;
> +	dma_addr_t dma_addr;
> +	struct drm_mm_node *node;
> +	struct panfrost_gem_object *bo;
> +	struct io_pgtable_ops *ops = pfdev->mmu->pgtbl_ops;
> +	pgoff_t page_offset = addr >> PAGE_SHIFT;
> +
> +	addr &= PAGE_MASK;
> +
> +	node = addr_to_drm_mm_node(pfdev, as, addr);
> +	if (!node)
> +		return -ENOENT;
> +
> +	page_offset -= node->start;
> +	bo = drm_mm_node_to_panfrost_bo(node);
> +	if (WARN_ON(!bo->no_map))
> +		return -EINVAL;
> +
> +	dev_info(pfdev->dev, "mapping page fault @ %llx", addr);
> +
> +	page = shmem_read_mapping_page(bo->base.base.filp->f_mapping,
> +				       page_offset);
> +	if (IS_ERR(page))
> +		return PTR_ERR(page);
> +
> +	dma_addr = dma_map_page(pfdev->dev, page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +
> +	mutex_lock(&pfdev->mmu->lock);
> +
> +	ops->map(ops, addr, dma_addr, PAGE_SIZE, IOMMU_WRITE | IOMMU_READ);
> +
> +	mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
> +
> +	mmu_hw_do_operation(pfdev, as, addr, PAGE_SIZE, AS_COMMAND_FLUSH_PT);
> +
> +	mutex_unlock(&pfdev->mmu->lock);
> +
> +	return 0;
> +}
> +
>  static const char *access_type_name(struct panfrost_device *pfdev,
>  		u32 fault_status)
>  {
> @@ -302,13 +358,11 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>  {
>  	struct panfrost_device *pfdev = data;
>  	u32 status = mmu_read(pfdev, MMU_INT_STAT);
> -	int i;
> +	int i, ret;
>  
>  	if (!status)
>  		return IRQ_NONE;
>  
> -	dev_err(pfdev->dev, "mmu irq status=%x\n", status);
> -
>  	for (i = 0; status; i++) {
>  		u32 mask = BIT(i) | BIT(i + 16);
>  		u64 addr;
> @@ -329,6 +383,17 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>  		access_type = (fault_status >> 8) & 0x3;
>  		source_id = (fault_status >> 16);
>  
> +		/* Page fault only */
> +		if ((status & mask) == BIT(i)) {
> +			WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
> +
> +			ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
> +			if (!ret) {
> +				status &= ~mask;
> +				continue;
> +			}
> +		}
> +
>  		/* terminal fault, print info about the fault */
>  		dev_err(pfdev->dev,
>  			"Unhandled Page fault in AS%d at VA 0x%016llX\n"
> @@ -370,8 +435,9 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
>  	if (irq <= 0)
>  		return -ENODEV;
>  
> -	err = devm_request_irq(pfdev->dev, irq, panfrost_mmu_irq_handler,
> -			       IRQF_SHARED, "mmu", pfdev);
> +	err = devm_request_threaded_irq(pfdev->dev, irq, NULL,
> +					panfrost_mmu_irq_handler,
> +					IRQF_ONESHOT, "mmu", pfdev);

Is it intentional to drop the SHARED flag? I have to admit I can't think
of a (real) platform which does share the IRQ, but kbase has always
supported sharing the GPU interrupts.

Steve

>  
>  	if (err) {
>  		dev_err(pfdev->dev, "failed to request mmu irq");
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index a52e0283b90d..6d83baa0e4c1 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -71,6 +71,8 @@ struct drm_panfrost_wait_bo {
>  	__s64 timeout_ns;	/* absolute */
>  };
>  
> +#define PANFROST_BO_NOMAP	1
> +
>  /**
>   * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
>   *
>