[PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Wed Jan 31 16:59:17 UTC 2018



On 01/25/2018 11:33 PM, Yu, Xiangliang wrote:
> You can add amdgpu_sriov_vf() check to avoid breaking sriov.

+ Haisheng

As found out after more debugging  and discussion with Haisheng from HW 
team, the sequence introduced by this change is is wrong, it causes 
compute rings test failure because "the ring buffer has to be filled 
with valid packets (such as NOPs) first before submitting MAP_QUEUEs 
packet into KIQ. Once a compute engine is mapped, it will immediately 
execute the ring buffer if the RTPR is not equal to the WTPR from the 
MQD. It could lead to engine hang if the ring buffer filled with random 
data."

Hence we would like to revert this change in amd-staging-drm-next and 
continue investigation on the SR-IOV side why the correct programming 
sequence doesn't work there. I myself currently working on setting up 
SR-IOV setup to take a look at that.

Thanks,
Andrey
>
>> -----Original Message-----
>> From: Grodzovsky, Andrey
>> Sent: Friday, January 26, 2018 11:29 AM
>> To: Yu, Xiangliang <Xiangliang.Yu at amd.com>; amd-
>> gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>
>> Subject: Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>> after resetting"
>>
>> No, just bare metal, I assumed your problem was with compute ring test
>> failure which I didn't see. Can you please recheck if reverting this still failing
>> on SRIOV ?
>> If so we obviously need to keep looking how to fix it.
>>
>> Thanks,
>> Andrey
>>
>> ________________________________________
>> From: Yu, Xiangliang
>> Sent: 25 January 2018 20:59:45
>> To: Grodzovsky, Andrey; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander; Grodzovsky, Andrey; Koenig, Christian
>> Subject: RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>> after resetting"
>>
>> Did you test reset case in sriov?
>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
>>> Of Andrey Grodzovsky
>>> Sent: Friday, January 26, 2018 7:07 AM
>>> To: amd-gfx at lists.freedesktop.org
>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Grodzovsky,
>> Andrey
>>> <Andrey.Grodzovsky at amd.com>; Yu, Xiangliang
>> <Xiangliang.Yu at amd.com>;
>>> Koenig, Christian <Christian.Koenig at amd.com>
>>> Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>>> after resetting"
>>>
>>> This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
>>>
>>> Fixes GFX ring test failure after HW reset.
>>> No compute ring test failures were observed with the change reverted.
>>> So seems like whatever problem that change was addressing is not
>>> present anymore.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
>>>   1 file changed, 3 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> index 1207f36..8a65b53 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct
>>> amdgpu_ring *ring)
>>>                /* reset MQD to a clean status */
>>>                if (adev->gfx.mec.mqd_backup[mqd_idx])
>>>                        memcpy(mqd, adev-
>>>> gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
>>> +             /* reset ring buffer */
>>> +             ring->wptr = 0;
>>> +             amdgpu_ring_clear_ring(ring);
>>>        } else {
>>>                amdgpu_ring_clear_ring(ring);
>>>        }
>>> @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct
>>> amdgpu_device *adev)
>>>        /* Test KCQs */
>>>        for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>>>                ring = &adev->gfx.compute_ring[i];
>>> -             if (adev->in_gpu_reset) {
>>> -                     /* move reset ring buffer to here to workaround
>>> -                      * compute ring test failed
>>> -                      */
>>> -                     ring->wptr = 0;
>>> -                     amdgpu_ring_clear_ring(ring);
>>> -             }
>>>                ring->ready = true;
>>>                r = amdgpu_ring_test_ring(ring);
>>>                if (r)
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180131/52b1e773/attachment-0001.html>


More information about the amd-gfx mailing list