[v2 28/31] drm/xe/svm: Introduce helper to migrate vma to vram

Thu Apr 11 02:49:21 UTC 2024

On Tue, Apr 09, 2024 at 04:17:39PM -0400, Oak Zeng wrote:
> Introduce a helper function xe_svm_migrate_vma_to_vram.
> 
> Since the source pages of the svm range can be physically not
> contiguous, and the destination vram pages can also be not
> contiguous, there is no easy way to migrate multiple pages per
> blitter command. We do page by page migration for now.
> 
> Migration is best effort. Even if we fail to migrate some pages,
> we will try to migrate the rest pages.
> 
> FIXME: Use one blitter command to copy when both src and dst are
> physically contiguous
> 

Yep, touch in this throughout the series. Only vram needs to be
contiguous though as we dynamically create PT mappings for sram pages in
the migrate code. Getting this in a must and should be done immediately
IMO as this a very, very basic perform thing we know needs to be done.
We will likely have to tune this code quite a bit for performance so
getting known things done would be helpful.

> FIXME: when a vma is partially migrated, split vma as we assume
> no mixture vma placement.
> 

Agree we do not want support partial migrations. We likely want to
return -EAGAIN for something and fault back to a smaller xe_vma chunk
size which I discussed in [1] and add comment on in [2].

Migration should be best case too if we fail migrate we can always leave
backing store in sram too.

I do have question though, when do we get partial migrations? A user
having called mlock on some of the pages? I just want to make sure I
fully understand that case.

[1] https://patchwork.freedesktop.org/patch/588526/?series=132229&rev=1
[2] https://patchwork.freedesktop.org/patch/588528/?series=132229&rev=1

> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at intel.com>
> Cc: Brian Welty <brian.welty at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.h         |   2 +
>  drivers/gpu/drm/xe/xe_svm_migrate.c | 115 ++++++++++++++++++++++++++++

Same comment on file structure throughout the series apply here too.

>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index c9e4239c44b4..18ce2e3757c5 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -83,4 +83,6 @@ int xe_devm_alloc_pages(struct xe_tile *tile,
>  void xe_devm_free_blocks(struct list_head *blocks);
>  void xe_devm_page_free(struct page *page);
>  vm_fault_t xe_svm_migrate_to_sram(struct vm_fault *vmf);
> +int xe_svm_migrate_vma_to_vram(struct xe_vm *vm, struct xe_vma *vma,
> +							struct xe_tile *tile);
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_svm_migrate.c b/drivers/gpu/drm/xe/xe_svm_migrate.c
> index 0db831af098e..ab8dd1f58aa4 100644
> --- a/drivers/gpu/drm/xe/xe_svm_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_svm_migrate.c
> @@ -220,3 +220,118 @@ vm_fault_t xe_svm_migrate_to_sram(struct vm_fault *vmf)
>  	kvfree(buf);
>  	return 0;
>  }
> +
> +/**
> + * xe_svm_migrate_vma_to_vram() - migrate backing store of a vma to vram
> + * Must be called with mmap_read_lock held.
> + * @vm: the vm that the vma belongs to
> + * @vma: the vma to migrate.
> + * @tile: the destination tile which holds the new backing store of the range
> + *
> + * Returns: negative errno on faiure, 0 on success
> + */
> +int xe_svm_migrate_vma_to_vram(struct xe_vm *vm,
> +							struct xe_vma *vma,
> +							struct xe_tile *tile)
> +{
> +	struct mm_struct *mm = vm->mm;
> +	unsigned long start = xe_vma_start(vma);
> +	unsigned long end = xe_vma_end(vma);
> +	unsigned long npages = (end - start) >> PAGE_SHIFT;
> +	struct xe_mem_region *mr = &tile->mem.vram;
> +	struct vm_area_struct *vas;
> +
> +	struct migrate_vma migrate = {
> +		.start		= start,
> +		.end		= end,
> +		.pgmap_owner	= tile->xe,

Again helper to assign owner.

> +		.flags          = MIGRATE_VMA_SELECT_SYSTEM,
> +	};
> +	struct device *dev = tile->xe->drm.dev;
> +	dma_addr_t *src_dma_addr;
> +	struct dma_fence *fence;
> +	struct page *src_page;
> +	LIST_HEAD(blocks);
> +	int ret = 0, i;
> +	u64 dst_dpa;
> +	void *buf;
> +
> +	mmap_assert_locked(mm);

This mmap_assert_locked is ambiguous, we should make it clear if this
read or write locked. Doesn't it have to be write to do the migrate
pages?

A larger question about the locking... The CPU fault handler holds the
mmap lock in write mode, right? 

I'm asking as basically I think at least initially we want to hold the
mmap lock in a way that the GPU handler and CPU handler do not race.
i.e. From fault userptr create in GPU fault handler to issuing the bind
we prevent the CPU fault handler from running.

I'm having issues figuring out how to prevent races between initial
binds of userptrs and userptr invalidates on faulting VMs. This race is
seen any xe_exec_fault_mode for example... So preventing races between
CPU / GPU fault handler with the mmap probably is a good idea initially.
Likely can make the locking finer grained once this is all working and I
figure out how to handle this race better.

> +
> +	vas = find_vma_intersection(mm, start, start + 4);

find_vma should work fine here.

> +	if (!vas)
> +		return -ENOENT;
> +
> +	migrate.vma = vas;
> +	buf = kvcalloc(npages, 2* sizeof(*migrate.src) + sizeof(*src_dma_addr),
> +					GFP_KERNEL);
> +	if(!buf)
> +		return -ENOMEM;
> +	migrate.src = buf;
> +	migrate.dst = migrate.src + npages;
> +	src_dma_addr = (dma_addr_t *) (migrate.dst + npages);
> +	ret = xe_devm_alloc_pages(tile, npages, &blocks, migrate.dst);

Again as I discussed in [3] I think this should be broken out into a
different step with the blocks allocated before this, and here just
populate migrate.dst from the existing blocks.

[3] https://patchwork.freedesktop.org/patch/588523/?series=132229&rev=1

> +	if (ret)
> +		goto kfree_buf;
> +
> +	ret = migrate_vma_setup(&migrate);
> +	if (ret) {
> +		drm_err(&tile->xe->drm, "vma setup returned %d for range [%lx - %lx]\n",
> +				ret, start, end);
> +		goto free_dst_pages;
> +	}
> +
> +	/**FIXME: partial migration of a range print a warning for now.
> +	 * If this message is printed, we need to split xe_vma as we
> +	 * don't support a mixture placement of one vma
> +	 */
> +	if (migrate.cpages != npages)
> +		drm_warn(&tile->xe->drm, "Partial migration for range [%lx - %lx], range is %ld pages, migrate only %ld pages\n",
> +				start, end, npages, migrate.cpages);

As discussed above, we shouldn't support this. We should fall back to
smaller xe_vma chunk size until we find one that works or simply leave
the pages in sram and map those pages to GPU.

> +
> +	/**Migrate page by page for now.
> +	 * Both source pages and destination pages can physically not contiguous,
> +	 * there is no good way to migrate multiple pages per blitter command.
> +	 */

Touched on this a bunch throughout the series, lets do better than a
page a time migration.

Algorithm should be very similar to what I discussed here [4] but with a
few key differences.

- I think the sram pages can be unpopulated (page == NULL) if the user
  has not yet touched the page
- Also I think the MIGRATE_PFN_MIGRATE bit being clear is valid

In these cases this indicate we have to issue a copy for the pages we
are accumulating with contigous vram addresses.

[4] https://patchwork.freedesktop.org/patch/588526/?series=132229&rev=1

> +	for (i = 0; i < npages; i++) {
> +		src_page = migrate_pfn_to_page(migrate.src[i]);
> +		if (unlikely(!src_page || !(migrate.src[i] & MIGRATE_PFN_MIGRATE)))

Discussed this in the CPU fault patch, once we call migrate_vma_setup,
on subsequent errors we need to call migrate_vma_finalize to revert the
pages to the original state. At least I think if I am reading the doc
after this correctly.

Here on error we just free the pages...

Matt

> +			goto free_dst_page;
> +
> +		xe_assert(tile->xe, !is_zone_device_page(src_page));
> +		src_dma_addr[i] = dma_map_page(dev, src_page, 0, PAGE_SIZE, DMA_TO_DEVICE);
> +		if (unlikely(dma_mapping_error(dev, src_dma_addr[i]))) {
> +			drm_warn(&tile->xe->drm, "dma map error for host pfn %lx\n", migrate.src[i]);
> +			goto free_dst_page;
> +		}
> +		dst_dpa = xe_mem_region_pfn_to_dpa(mr, migrate.dst[i]);
> +		fence = xe_migrate_pa(tile->migrate, src_dma_addr[i], false,
> +				dst_dpa, true, PAGE_SIZE);
> +		if (IS_ERR(fence)) {
> +			drm_warn(&tile->xe->drm, "migrate host page (pfn: %lx) to vram failed\n",
> +					migrate.src[i]);
> +			/**Migration is best effort. Even we failed here, we continue*/
> +			goto free_dst_page;
> +		}
> +		/**FIXME: Use the first migration's out fence as the second migration's input fence,
> +		 * and so on. Only wait the out fence of last migration?
> +		 */
> +		dma_fence_wait(fence, false);
> +		dma_fence_put(fence);
> +free_dst_page:
> +		xe_devm_page_free(pfn_to_page(migrate.dst[i]));
> +	}
> +
> +	for (i = 0; i < npages; i++)
> +		if (!(dma_mapping_error(dev, src_dma_addr[i])))
> +			dma_unmap_page(dev, src_dma_addr[i], PAGE_SIZE, DMA_TO_DEVICE);
> +
> +	migrate_vma_pages(&migrate);
> +	migrate_vma_finalize(&migrate);
> +free_dst_pages:
> +	if (ret)
> +		xe_devm_free_blocks(&blocks);
> +kfree_buf:
> +	kfree(buf);
> +	return ret;
> +}
> -- 
> 2.26.3
>