[PATCH 1/3] drm/xe/bo: Forward the decision to evict local objects during validation

Matthew Brost matthew.brost at intel.com
Tue Feb 27 17:35:51 UTC 2024


On Tue, Feb 27, 2024 at 12:52:14PM +0100, Thomas Hellström wrote:
> On Mon, 2024-02-26 at 21:36 +0000, Matthew Brost wrote:
> > On Mon, Feb 26, 2024 at 05:44:53PM +0100, Thomas Hellström wrote:
> > > Currently we refuse evicting the VM's local objects. However that
> > > is necessary for some objects. Most notably completely unbound
> > > objects.
> > > Forward this decision to be per-object based in the TTM
> > > eviction_valuable() callback.
> > > 
> > > Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external-
> > > and evicted objects")
> > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > Cc: Oded Gabbay <ogabbay at kernel.org>
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c       | 17 ++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_exec.c     |  2 ++
> > >  drivers/gpu/drm/xe/xe_vm.c       |  8 ++++++--
> > >  drivers/gpu/drm/xe/xe_vm_types.h | 10 ++++++++++
> > >  4 files changed, 34 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 76dfaf1cd200..143d93632253 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1049,6 +1049,21 @@ static void
> > > xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
> > >  	}
> > >  }
> > >  
> > > +static bool xe_bo_eviction_valuable(struct ttm_buffer_object
> > > *ttm_bo,
> > > +				    const struct ttm_place *place)
> > > +{
> > > +	if (xe_bo_is_xe_bo(ttm_bo)) {
> > > +		struct xe_bo *xe_bo = ttm_to_xe_bo(ttm_bo);
> > > +		struct xe_vm *vm = xe_bo->vm;
> > > +
> > > +		if (vm && !drm_gpuvm_is_extobj(&vm->gpuvm,
> > > &ttm_bo->base) &&
> > 
> > Is this not redundant - if xe_bo->vm is set then it not an external
> > BO?
> 
> It's to exclude the case where the resv is individualized as part of
> destruction. Then drm_gpuvm_is_extobj() should return true.
> 

Makes sense.

> > 
> > > +		    vm->is_validating)
> > > +			return false;
> > > +	}
> > > +
> > > +	return ttm_bo_eviction_valuable(ttm_bo, place);
> > > +}
> > > +
> > >  const struct ttm_device_funcs xe_ttm_funcs = {
> > >  	.ttm_tt_create = xe_ttm_tt_create,
> > >  	.ttm_tt_populate = xe_ttm_tt_populate,
> > > @@ -1059,7 +1074,7 @@ const struct ttm_device_funcs xe_ttm_funcs =
> > > {
> > >  	.io_mem_reserve = xe_ttm_io_mem_reserve,
> > >  	.io_mem_pfn = xe_ttm_io_mem_pfn,
> > >  	.release_notify = xe_ttm_bo_release_notify,
> > > -	.eviction_valuable = ttm_bo_eviction_valuable,
> > > +	.eviction_valuable = xe_bo_eviction_valuable,
> > >  	.delete_mem_notify = xe_ttm_bo_delete_mem_notify,
> > >  };
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > > b/drivers/gpu/drm/xe/xe_exec.c
> > > index 952496c6260d..e8eb09993485 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > @@ -102,7 +102,9 @@ static int xe_exec_fn(struct drm_gpuvm_exec
> > > *vm_exec)
> > >  	int num_fences;
> > >  	int ret;
> > >  
> > > +	vm->is_validating = true;
> > >  	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
> > > +	vm->is_validating = false;
> > 
> > Can't say I love this coding pattern but you address this in the
> > kernel
> > doc for is_validating. Agree with updating TTM to avoid this.
> 
> Yeah.
> 
> > 
> > Also this is kinda racey too - e.g. 2 VMs could set vm->is_validating
> > at
> > the same time. We really want to check if BO being evicted VM's
> > matches
> > the VM we are trying to validate in the current context, right?
> 
> It's not racy AFAICT. Since it's protected by the vm resv, and only
> ever set under the vm->resv. Any reader holding the vm->resv can thus
> rely on its value.
> 
> > 
> > With that - maybe we try to update TTM in this series too?
> 
> I'd be good to have the completely unbound case fixed with 6.8 so
> that's why this got a bit rushed.
>

Ok, don't want to hold up the patch and it is documented.
 
> > 
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index e3bde897f6e8..07b0cda12d52 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -474,7 +474,7 @@ static int xe_gpuvm_validate(struct
> > > drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> > >  		list_move_tail(&gpuva_to_vma(gpuva)-
> > > >combined_links.rebind,
> > >  			       &vm->rebind_list);
> > >  
> > > -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> > > +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, true);
> > 
> > I think we likely can flip the last argument to true in a few other
> > places too. Or maybe even drop the argument and either always set to
> > true or true if not an ext obj?
> 
> Yeah we should indeed be able to set it elsewhere. The weird case is
> when allocating page-table bos, because then we could accidently evict
> the bo the page-tables are intended to point to, and we'd have to rerun
> the validation again after binding. I'll see if I can get that going in
> a not to complicated way.
>

Again don't want to hold up this patch and this can be done in a follow up.

With of this:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

 
> /Thomas
> 
> 
> > 
> > Matt
> > 
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > @@ -515,7 +515,11 @@ static int xe_preempt_work_begin(struct
> > > drm_exec *exec, struct xe_vm *vm,
> > >  	if (err)
> > >  		return err;
> > >  
> > > -	return drm_gpuvm_validate(&vm->gpuvm, exec);
> > > +	vm->is_validating = true;
> > > +	err = drm_gpuvm_validate(&vm->gpuvm, exec);
> > > +	vm->is_validating = false;
> > > +
> > > +	return err;
> > >  }
> > >  
> > >  static void preempt_rebind_work_func(struct work_struct *w)
> > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > > b/drivers/gpu/drm/xe/xe_vm_types.h
> > > index 7d4f810f9c04..76c49fba9cd3 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > @@ -286,6 +286,16 @@ struct xe_vm {
> > >  
> > >  	/** @batch_invalidate_tlb: Always invalidate TLB before
> > > batch start */
> > >  	bool batch_invalidate_tlb;
> > > +
> > > +	/**
> > > +	 * @is_validaing: Whether we are validating the vm's local
> > > objects.
> > > +	 * This field is protected by the vm's resv. Note that
> > > this
> > > +	 * is needed only since TTM doesn't forward the
> > > ttm_operation_ctx to the
> > > +	 * eviction_valuable() callback, so if / when that is in
> > > place, this
> > > +	 * should be removed.
> > > +	 */
> > > +	bool is_validating;
> > > +
> > >  	/** @xef: XE file handle for tracking this VM's drm client
> > > */
> > >  	struct xe_file *xef;
> > >  };
> > > -- 
> > > 2.43.0
> > > 
> 


More information about the Intel-xe mailing list