[PATCH 2/3] amd/amdgpu: wait no process running in kfd before resuming device

Felix Kuehling felix.kuehling at amd.com
Mon Mar 25 18:45:49 UTC 2024


On 2024-03-22 15:57, Zhigang Luo wrote:
> it will cause page fault after device recovered if there is a process running.
>
> Signed-off-by: Zhigang Luo <Zhigang.Luo at amd.com>
> Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 70261eb9b0bb..2867e9186e44 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4974,6 +4974,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>   retry:
>   	amdgpu_amdkfd_pre_reset(adev);
>   
> +	amdgpu_amdkfd_wait_no_process_running(adev);
> +

This waits for the processes to be terminated. What would cause the 
processes to be terminated? Why do the processes need to be terminated? 
Isn't it enough if the processes are removed from the runlist in 
pre-reset, so they can no longer execute on the GPU?

Regards,
   Felix


>   	amdgpu_device_stop_pending_resets(adev);
>   
>   	if (from_hypervisor)


More information about the amd-gfx mailing list