[PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Fri Oct 26 15:00:11 UTC 2018



On 10/26/2018 04:05 AM, Christian König wrote:
> Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
>> Problem: After GPU reset on dGPUs with gfx8 compute ring
>> 1.0.0 fails to pass the ring test. Ring registers inspection
>> shows that it's active and no hang is observed (rptr == wptr)
>> No significant diffs were observed between CP_HQD* registers
>> for the ring in good and bad shape.
>>
>> Fix: No clear reason why but reversing the order of ring tests
>> fixes the problem.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>
> Mhm, maybe try adding a delay before the ring test?
First thing I tried, didn't help.
>
> Could be that the rings are started in reverse order as well and for 
> some reason the first one is start tested to quickly after a reset.

No, KCQ queues mapping just before the test goes in 0..max order.

Andrey
>
> Anyway patch is Acked-by: Christian König <christian.koenig at amd.com>
>
> Thanks,
> Christian.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index b2e1376..02f8ca5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct 
>> amdgpu_device *adev)
>>       if (r)
>>           goto done;
>>   -    /* Test KCQs */
>> -    for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>> +    /* Test KCQs - reversing the order of rings seems to fix ring 
>> test failure
>> +     * after GPU reset
>> +     */
>> +    for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
>>           ring = &adev->gfx.compute_ring[i];
>>           r = amdgpu_ring_test_helper(ring);
>>       }
>



More information about the amd-gfx mailing list