[PATCH 4/7] drm/xe: Relax runtime pm protection around VM

Mon May 13 13:23:38 UTC 2024

On Thu, 2024-05-09 at 15:16 -0400, Rodrigo Vivi wrote:
> In the regular use case scenario, user space will create a
> VM, and keep it alive for the entire duration of its workload.
> 
> For the regular desktop cases, it means that the VM
> is alive even on idle scenarios where display goes off. This
> is unacceptable since this would entirely block runtime PM
> indefinitely, blocking deeper Package-C state. This would be
> a waste drainage of power.
> 
> Limit the VM protection solely for long-running workloads that
> are not protected by the scheduler references.
> By design, run_job for long-running workloads returns NULL and
> the scheduler drops all the references of it, hence protecting
> the VM for this case is necessary.

I still think we can drop the pm when we deactivate rebind and grab it
when we activate it. (vm->preeprt.rebind_deactivated) This will not
work for faulting vms though, and can be done as a follow-up if
desired.

Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>

> 
> v2: Update commit message to a more imperative language and to
>     reflect why the VM protection is really needed.
>     Also add a comment in the code to let the reason visbible.
> 
> v3: Remove vma_access case and the mentions to mmap. Mmap cases
>     are already protected by the gem page fault.
> 
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Tested-by: Francois Dugast <francois.dugast at intel.com>
> Acked-by: Matthew Brost <matthew.brost at intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index c5b1694b292f..2a49dea231e7 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1347,7 +1347,13 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags)
>  
>  	vm->pt_ops = &xelp_pt_ops;
>  
> -	if (!(flags & XE_VM_FLAG_MIGRATION))
> +	/*
> +	 * Long-running workloads are not protected by the scheduler
> references.
> +	 * By design, run_job for long-running workloads returns
> NULL and the
> +	 * scheduler drops all the references of it, hence
> protecting the VM
> +	 * for this case is necessary.
> +	 */
> +	if (flags & XE_VM_FLAG_LR_MODE)
>  		xe_pm_runtime_get_noresume(xe);
>  
>  	vm_resv_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> @@ -1457,7 +1463,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags)
>  	for_each_tile(tile, xe, id)
>  		xe_range_fence_tree_fini(&vm->rftree[id]);
>  	kfree(vm);
> -	if (!(flags & XE_VM_FLAG_MIGRATION))
> +	if (flags & XE_VM_FLAG_LR_MODE)
>  		xe_pm_runtime_put(xe);
>  	return ERR_PTR(err);
>  }
> @@ -1592,7 +1598,7 @@ static void vm_destroy_work_func(struct
> work_struct *w)
>  
>  	mutex_destroy(&vm->snap_mutex);
>  
> -	if (!(vm->flags & XE_VM_FLAG_MIGRATION))
> +	if (vm->flags & XE_VM_FLAG_LR_MODE)
>  		xe_pm_runtime_put(xe);
>  
>  	for_each_tile(tile, xe, id)