[PATCH v2] drm/amdkfd: Not restore userptr buffer if kfd process has been removed

Felix Kuehling felix.kuehling at amd.com
Wed Oct 9 21:36:16 UTC 2024


On 2024-10-04 11:54, Xiaogang.Chen wrote:
> From: Xiaogang Chen <xiaogang.chen at amd.com>
>
> When kfd process has been terminated not restore userptr buffer after mmu
> notifier invalidates a range.

Is this fixing a real problem or a hypothetical problem? If there is a 
real problem, can you include more information here? It looks to me that 
amdgpu_amdkfd_restore_userptr_worker is already handling the cases where 
a process or mm_struct no longer exists. Maybe the only user visible 
change from this patch is, that you no longer print "Failed to quiesce 
KFD" in a corner case of an MMU notifier for a process that no longer 
exists?

Or is there a problem with the lifetime of the process_info that 
contains the work_struct?


>
> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index ce5ca304dba9..6b4be7893dfb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
>   		/* First eviction, stop the queues */
>   		r = kgd2kfd_quiesce_mm(mni->mm,
>   				       KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
> -		if (r)
> +
> +		if (r && r != -ESRCH)
>   			pr_err("Failed to quiesce KFD\n");
> -		queue_delayed_work(system_freezable_wq,
> -			&process_info->restore_userptr_work,
> -			msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
> +
> +		if (r != -ESRCH) {

Braces are not required because there is only one statement inside the if.

Regards,
   Felix


> +			queue_delayed_work(system_freezable_wq,
> +				&process_info->restore_userptr_work,
> +				msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
> +		}
>   	}
>   	mutex_unlock(&process_info->notifier_lock);
>   


More information about the amd-gfx mailing list