[PATCH] drm/xe: Avoid evicting object of the same vm in none fault mode

Zeng, Oak oak.zeng at intel.com
Fri Dec 6 15:19:30 UTC 2024


Hi Rodrigo,

> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Sent: December 6, 2024 9:46 AM
> To: Zeng, Oak <oak.zeng at intel.com>
> Cc: intel-xe at lists.freedesktop.org;
> Thomas.Hellstrom at linux.intel.com
> Subject: Re: [PATCH] drm/xe: Avoid evicting object of the same vm in
> none fault mode
> 
> On Mon, Dec 02, 2024 at 09:19:29PM -0500, Oak Zeng wrote:
> > BO validation during vm_bind could trigger memory eviction when
> > system runs under memory pressure. Right now we blindly evict
> > BOs of all VMs. This scheme has a problem when system runs in
> > none recoverable page fault mode: even though the vm_bind could
> > be successful by evicting BOs, the later the rebinding of the
> > evicted BOs would fail. So it is better to report an out-of-
> > memory failure at vm_bind time than at time of rebinding where
> > xekmd currently doesn't have a good mechanism to report error
> > to user space.
> >
> > This patch implemented a scheme to only evict objects of other
> > VMs during vm_bind time. Object of the same VM will skip eviction.
> > If we failed to find enough memory for vm_bind, we report error
> > to user space at vm_bind time.
> >
> > This scheme is not needed for recoverable page fault mode under
> > what we can dynamically fault-in pages on demand.
> >
> > v1: Use xe_vm_in_preempt_fence_mode instead of stack variable
> (Thomas)
> >
> > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > Suggested-by: Thomas Hellström
> <thomas.hellstrom at linux.intel.com>
> > Reviewed-by: Thomas Hellström
> <thomas.hellstrom at linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> b/drivers/gpu/drm/xe/xe_vm.c
> > index 2492750505d69..016fedae5d554 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -2359,13 +2359,15 @@ static int vma_lock_and_validate(struct
> drm_exec *exec, struct xe_vma *vma,
> >  				 bool validate)
> >  {
> >  	struct xe_bo *bo = xe_vma_bo(vma);
> > +	struct xe_vm *vm = xe_vma_vm(vma);
> 
> for single usage like below we should avoid variable declaration,
> specially in
> this case where you do have line space left there... then it gets even
> clear
> that the only change in the patch is
> 
> - true
> + !xe_vm_in_preempt_fence_mode(vm)

Are you suggesting to remove above line "+	struct xe_vm *vm = xe_vma_vm(vma);"?

Vm variable is used twice below, not single.

If I remove vm variable, then below I should write:

Xe_bo_validate(bo, xe_vma_vm(vma), !xe_vm_in_preempt_fence_mode(xe_vma_vm(vma)));

Is this what you are suggesting?

Oak

> 
> >  	int err = 0;
> >
> >  	if (bo) {
> >  		if (!bo->vm)
> >  			err = drm_exec_lock_obj(exec, &bo-
> >ttm.base);
> >  		if (!err && validate)
> > -			err = xe_bo_validate(bo, xe_vma_vm(vma),
> true);
> > +			err = xe_bo_validate(bo, vm,
> > +
> 	     !xe_vm_in_preempt_fence_mode(vm));
> >  	}
> >
> >  	return err;
> > --
> > 2.26.3
> >


More information about the Intel-xe mailing list