[PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset

Christian König ckoenig.leichtzumerken at gmail.com
Fri Oct 26 08:05:02 UTC 2018


Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
> Problem: After GPU reset on dGPUs with gfx8 compute ring
> 1.0.0 fails to pass the ring test. Ring registers inspection
> shows that it's active and no hang is observed (rptr == wptr)
> No significant diffs were observed between CP_HQD* registers
> for the ring in good and bad shape.
>
> Fix: No clear reason why but reversing the order of ring tests
> fixes the problem.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Mhm, maybe try adding a delay before the ring test?

Could be that the rings are started in reverse order as well and for 
some reason the first one is start tested to quickly after a reset.

Anyway patch is Acked-by: Christian König <christian.koenig at amd.com>

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b2e1376..02f8ca5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
>   	if (r)
>   		goto done;
>   
> -	/* Test KCQs */
> -	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +	/* Test KCQs - reversing the order of rings seems to fix ring test failure
> +	 * after GPU reset
> +	 */
> +	for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
>   		ring = &adev->gfx.compute_ring[i];
>   		r = amdgpu_ring_test_helper(ring);
>   	}



More information about the amd-gfx mailing list