[PATCH 1/3] drm/xe/bo: Forward the decision to evict local objects during validation

Thomas Hellström thomas.hellstrom at linux.intel.com
Tue Feb 27 11:52:14 UTC 2024


On Mon, 2024-02-26 at 21:36 +0000, Matthew Brost wrote:
> On Mon, Feb 26, 2024 at 05:44:53PM +0100, Thomas Hellström wrote:
> > Currently we refuse evicting the VM's local objects. However that
> > is necessary for some objects. Most notably completely unbound
> > objects.
> > Forward this decision to be per-object based in the TTM
> > eviction_valuable() callback.
> > 
> > Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external-
> > and evicted objects")
> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > Cc: Oded Gabbay <ogabbay at kernel.org>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c       | 17 ++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_exec.c     |  2 ++
> >  drivers/gpu/drm/xe/xe_vm.c       |  8 ++++++--
> >  drivers/gpu/drm/xe/xe_vm_types.h | 10 ++++++++++
> >  4 files changed, 34 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 76dfaf1cd200..143d93632253 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1049,6 +1049,21 @@ static void
> > xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
> >  	}
> >  }
> >  
> > +static bool xe_bo_eviction_valuable(struct ttm_buffer_object
> > *ttm_bo,
> > +				    const struct ttm_place *place)
> > +{
> > +	if (xe_bo_is_xe_bo(ttm_bo)) {
> > +		struct xe_bo *xe_bo = ttm_to_xe_bo(ttm_bo);
> > +		struct xe_vm *vm = xe_bo->vm;
> > +
> > +		if (vm && !drm_gpuvm_is_extobj(&vm->gpuvm,
> > &ttm_bo->base) &&
> 
> Is this not redundant - if xe_bo->vm is set then it not an external
> BO?

It's to exclude the case where the resv is individualized as part of
destruction. Then drm_gpuvm_is_extobj() should return true.

> 
> > +		    vm->is_validating)
> > +			return false;
> > +	}
> > +
> > +	return ttm_bo_eviction_valuable(ttm_bo, place);
> > +}
> > +
> >  const struct ttm_device_funcs xe_ttm_funcs = {
> >  	.ttm_tt_create = xe_ttm_tt_create,
> >  	.ttm_tt_populate = xe_ttm_tt_populate,
> > @@ -1059,7 +1074,7 @@ const struct ttm_device_funcs xe_ttm_funcs =
> > {
> >  	.io_mem_reserve = xe_ttm_io_mem_reserve,
> >  	.io_mem_pfn = xe_ttm_io_mem_pfn,
> >  	.release_notify = xe_ttm_bo_release_notify,
> > -	.eviction_valuable = ttm_bo_eviction_valuable,
> > +	.eviction_valuable = xe_bo_eviction_valuable,
> >  	.delete_mem_notify = xe_ttm_bo_delete_mem_notify,
> >  };
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 952496c6260d..e8eb09993485 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -102,7 +102,9 @@ static int xe_exec_fn(struct drm_gpuvm_exec
> > *vm_exec)
> >  	int num_fences;
> >  	int ret;
> >  
> > +	vm->is_validating = true;
> >  	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
> > +	vm->is_validating = false;
> 
> Can't say I love this coding pattern but you address this in the
> kernel
> doc for is_validating. Agree with updating TTM to avoid this.

Yeah.

> 
> Also this is kinda racey too - e.g. 2 VMs could set vm->is_validating
> at
> the same time. We really want to check if BO being evicted VM's
> matches
> the VM we are trying to validate in the current context, right?

It's not racy AFAICT. Since it's protected by the vm resv, and only
ever set under the vm->resv. Any reader holding the vm->resv can thus
rely on its value.

> 
> With that - maybe we try to update TTM in this series too?

I'd be good to have the completely unbound case fixed with 6.8 so
that's why this got a bit rushed.

> 
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index e3bde897f6e8..07b0cda12d52 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -474,7 +474,7 @@ static int xe_gpuvm_validate(struct
> > drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> >  		list_move_tail(&gpuva_to_vma(gpuva)-
> > >combined_links.rebind,
> >  			       &vm->rebind_list);
> >  
> > -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> > +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, true);
> 
> I think we likely can flip the last argument to true in a few other
> places too. Or maybe even drop the argument and either always set to
> true or true if not an ext obj?

Yeah we should indeed be able to set it elsewhere. The weird case is
when allocating page-table bos, because then we could accidently evict
the bo the page-tables are intended to point to, and we'd have to rerun
the validation again after binding. I'll see if I can get that going in
a not to complicated way.

/Thomas


> 
> Matt
> 
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -515,7 +515,11 @@ static int xe_preempt_work_begin(struct
> > drm_exec *exec, struct xe_vm *vm,
> >  	if (err)
> >  		return err;
> >  
> > -	return drm_gpuvm_validate(&vm->gpuvm, exec);
> > +	vm->is_validating = true;
> > +	err = drm_gpuvm_validate(&vm->gpuvm, exec);
> > +	vm->is_validating = false;
> > +
> > +	return err;
> >  }
> >  
> >  static void preempt_rebind_work_func(struct work_struct *w)
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 7d4f810f9c04..76c49fba9cd3 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -286,6 +286,16 @@ struct xe_vm {
> >  
> >  	/** @batch_invalidate_tlb: Always invalidate TLB before
> > batch start */
> >  	bool batch_invalidate_tlb;
> > +
> > +	/**
> > +	 * @is_validaing: Whether we are validating the vm's local
> > objects.
> > +	 * This field is protected by the vm's resv. Note that
> > this
> > +	 * is needed only since TTM doesn't forward the
> > ttm_operation_ctx to the
> > +	 * eviction_valuable() callback, so if / when that is in
> > place, this
> > +	 * should be removed.
> > +	 */
> > +	bool is_validating;
> > +
> >  	/** @xef: XE file handle for tracking this VM's drm client
> > */
> >  	struct xe_file *xef;
> >  };
> > -- 
> > 2.43.0
> > 



More information about the Intel-xe mailing list