[PATCH] drm/amdgpu: Fix skipping hangged job reset during gpu recover.

Koenig, Christian Christian.Koenig at amd.com
Wed Oct 31 14:38:45 UTC 2018


Am 31.10.18 um 15:36 schrieb Andrey Grodzovsky:
> Problem:
> During GPU recover DAL would hang in
> amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty
>
> Fix:
> Turns out there was what looks like a typo introduced by
> 3320b8d drm/amdgpu: remove job->ring which caused skipping
> amdgpu_fence_driver_force_completion for guilty's job fence and so it
> was never force signaled and this would cause the hang later in DAL.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Crap, I was already staring at that code for a while as well but didn't 
realized what was wrong with it.

Patch is Reviewed-by: Christian König <christian.koenig at amd.com>

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9a33fd0..8717a4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3363,7 +3363,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   
>   		kthread_park(ring->sched.thread);
>   
> -		if (job && job->base.sched == &ring->sched)
> +		if (job && job->base.sched != &ring->sched)
>   			continue;
>   
>   		drm_sched_hw_job_reset(&ring->sched, job ? &job->base : NULL);



More information about the amd-gfx mailing list