[PATCH] drm/xe: Avoid evicting object of the same vm in none fault mode

Thomas Hellström thomas.hellstrom at linux.intel.com
Fri Nov 29 08:45:18 UTC 2024


On Thu, 2024-11-28 at 16:01 -0500, Oak Zeng wrote:
> BO validation during vm_bind could trigger memory eviction when
> system runs under memory pressure. Right now we blindly evict
> BOs of all VMs. This scheme has a problem when system runs in
> none recoverable page fault mode: even though the vm_bind could
> be successful by evicting BOs, the later the rebinding of the
> evicted BOs would fail. So it is better to report an out-of-
> memory failure at vm_bind time than at time of rebinding where
> xekmd currently doesn't have a good mechanism to report error
> to user space.
> 
> This patch implemented a scheme to only evict objects of other
> VMs during vm_bind time. Object of the same VM will skip eviction.
> If we failed to find enough memory for vm_bind, we report error
> to user space at vm_bind time.
> 
> This scheme is not needed for recoverable page fault mode under
> what we can dynamically fault-in pages on demand.
> 
> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> Suggested-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 2492750505d69..c005c96b88167 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2359,13 +2359,15 @@ static int vma_lock_and_validate(struct
> drm_exec *exec, struct xe_vma *vma,
>  				 bool validate)
>  {
>  	struct xe_bo *bo = xe_vma_bo(vma);
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +	bool preempt_mode = xe_vm_in_preempt_fence_mode(vm);

I'd skip the stack variable here and use the function call directly
below.

>  	int err = 0;
>  
>  	if (bo) {
>  		if (!bo->vm)
>  			err = drm_exec_lock_obj(exec, &bo-
> >ttm.base);
>  		if (!err && validate)
> -			err = xe_bo_validate(bo, xe_vma_vm(vma),
> true);
> +			err = xe_bo_validate(bo, vm, !preempt_mode);
>  	}
>  
>  	return err;

Anyway,
Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>

Note that this is not fully sufficient to avoid OOM errors in the
rebind worker. Another process may have already evicted this vm's
memory and then pinned a lot of VRAM. But it should probably be
sufficient in single-client cases.

Thanks,
Thomas





More information about the Intel-xe mailing list