[PATCH v2 26/32] drm/xe/madvise: Update migration policy based on preferred location

Wed May 14 22:04:44 UTC 2025

On Mon, Apr 07, 2025 at 03:47:13PM +0530, Himal Prasad Ghimiray wrote:
> When the user sets the valid devmem_fd as a preferred location, GPU fault
> will trigger migration to tile of device associated with devmem_fd.
> 
> If the user sets an invalid devmem_fd the preferred location is current
> placement only.
> 
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c        | 15 ++++++++++++++-
>  drivers/gpu/drm/xe/xe_vm.h         |  3 +++
>  drivers/gpu/drm/xe/xe_vm_madvise.c | 20 +++++++++++++++++++-
>  3 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index d40111e29bfe..60dfb1bf12ca 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -765,6 +765,12 @@ bool xe_svm_range_needs_migrate_to_vram(struct xe_svm_range *range, struct xe_vm
>  	return needs_migrate;
>  }
>  
> +static const u32 region_to_mem_type[] = {
> +	XE_PL_TT,
> +	XE_PL_VRAM0,
> +	XE_PL_VRAM1,
> +};
> +
>  /**
>   * xe_svm_handle_pagefault() - SVM handle page fault
>   * @vm: The VM.
> @@ -796,6 +802,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	int retry_count = 3;
>  	ktime_t end = 0;
> +	u32 region;
>  	int err;
>  
>  	lockdep_assert_held_write(&vm->lock);
> @@ -820,7 +827,13 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	range_debug(range, "PAGE FAULT");
>  
> -	if (xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
> +	region = vma->attr.preferred_loc.devmem_fd;

Mentioned this earlier in the series - you are assiging a devmem_fd to a
region which is a bit confusing.

> +
> +	if (xe_svm_range_needs_migrate_to_vram(range, vma, region)) {
> +		region = region ? region : 1;

I think the default (region unset) should be the VRAM closest to the GT
of the fault.

> +		/* Need rework for multigpu */
> +		tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
> +
>  		err = xe_svm_alloc_vram(vm, tile, range, &ctx);
>  		if (err) {
>  			if (retry_count) {
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 4e45230b7205..377f62f859b7 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -220,6 +220,9 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
>  
>  int xe_vm_userptr_check_repin(struct xe_vm *vm);
>  
> +bool xe_vma_has_preferred_mem_loc(struct xe_vma *vma,
> +				  u32 *mem_region, u32 *devmem_fd);
> +
>  int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
>  struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma,
>  				u8 tile_mask);
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 7e1a95106cb9..f870e8642190 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -61,7 +61,25 @@ static int madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
>  				     struct xe_vma **vmas, int num_vmas,
>  				     struct drm_xe_madvise_ops ops)
>  {
> -	/* Implementation pending */
> +	s32 devmem_fd;
> +	u32 migration_policy;
> +	int i;
> +
> +	xe_assert(vm->xe, ops.type == DRM_XE_VMA_ATTR_PREFERRED_LOC);
> +	vm_dbg(&xe->drm, "migration policy = %d, devmem_fd = %d\n",
> +	       ops.preferred_mem_loc.migration_policy,
> +	       ops.preferred_mem_loc.devmem_fd);

As mentioned in patch #27, I'm not sure this debug info is all that
useful.

> +
> +	devmem_fd = (s32)ops.preferred_mem_loc.devmem_fd;
> +	devmem_fd = (devmem_fd < 0) ? 0 : devmem_fd;
> +

Why (devmem_fd < 0) ? 0? I'm not following this.

> +	migration_policy = ops.preferred_mem_loc.migration_policy;
> +

Mentioned earlier in the series, I'm confused by migration_policy and it
also looks to be unused unless I'm missing something?

Matt

> +	for (i = 0; i < num_vmas; i++) {
> +		vmas[i]->attr.preferred_loc.devmem_fd = devmem_fd;
> +		vmas[i]->attr.preferred_loc.migration_policy = migration_policy;
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.34.1
>