[PATCH] drm/amdgpu: increase timeout of IB test

Christian König deathsimple at vodafone.de
Tue Jul 26 07:42:33 UTC 2016


Am 26.07.2016 um 09:40 schrieb zhoucm1:
>
> On 2016年07月26日 15:32, Christian König wrote:
>> Ok, I really wasn't expecting this. How about 100ms?
> I tried it just now, 100ms isn't enough as well, see the append log.
>
>>
>> I just want to avoid that a reset takes more than 1 or 2 seconds even 
>> when it didn't worked.
>>
>> With 1 second timeout for each IB test we easily need 10+ seconds 
>> when the hardware doesn't response at all.
> This isn't true, when anyone of them timeout happens, 
> amdgpu_ib_ring_tests will return error.

Ah! Of course we abort after the first failed test.

In this case feel free to add my Reviewed-by: Christian König 
<christian.koenig at amd.com> to the original patch.

Regards,
Christian.

>
> [   59.286927] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx 
> timeout, last signaled seq=30397, last emitted seq=30399
> [   59.287253] [drm] IP block:1 is hang!
> [   59.287262] [drm] IP block:5 is hang!
> [   59.288375] [drm] Some block need full reset!
> [   59.288385] pp_set_clockgating_state was not implemented.
> [   59.343335] amdgpu 0000:03:00.0: GPU pci config reset
> [   59.422762] amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
> [   59.423971] [drm] PCIE GART of 4096M enabled (table at 
> 0x0000000000040000).
> [   59.433374] current thermal is out of range
> [   59.436736] [drm] ring test on 0 succeeded in 14 usecs
> [   59.437341] [drm] ring test on 1 succeeded in 27 usecs
> [   59.437388] [drm] ring test on 2 succeeded in 23 usecs
> [   59.437399] [drm] ring test on 3 succeeded in 5 usecs
> [   59.437405] [drm] ring test on 4 succeeded in 2 usecs
> [   59.437412] [drm] ring test on 5 succeeded in 2 usecs
> [   59.437420] [drm] ring test on 6 succeeded in 3 usecs
> [   59.437429] [drm] ring test on 7 succeeded in 3 usecs
> [   59.437435] [drm] ring test on 8 succeeded in 2 usecs
> [   59.437493] [drm] ring test on 9 succeeded in 6 usecs
> [   59.437501] [drm] ring test on 10 succeeded in 6 usecs
> [   59.464320] [drm] ring test on 11 succeeded in 2 usecs
> [   59.464322] [drm] UVD initialized successfully.
> [   59.564307] [drm] ring test on 12 succeeded in 14 usecs
> [   59.564316] [drm] ring test on 13 succeeded in 3 usecs
> [   59.564317] [drm] VCE initialized successfully.
> [   59.662835] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
> test timed out.
> [   59.663030] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
> failed testing IB on GFX ring (-110).
> [   59.663227] amdgpu 0000:03:00.0: ib ring test failed (-110).
> [   59.663349] pp_set_clockgating_state was not implemented.
> [   59.749358] amdgpu 0000:03:00.0: GPU pci config reset
> [   59.830784] amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
> [   59.831982] [drm] PCIE GART of 4096M enabled (table at 
> 0x0000000000040000).
> [   59.841412] current thermal is out of range
> [   59.844763] [drm] ring test on 0 succeeded in 14 usecs
> [   59.845341] [drm] ring test on 1 succeeded in 27 usecs
> [   59.845385] [drm] ring test on 2 succeeded in 21 usecs
> [   59.845397] [drm] ring test on 3 succeeded in 5 usecs
> [   59.845406] [drm] ring test on 4 succeeded in 3 usecs
> [   59.845412] [drm] ring test on 5 succeeded in 2 usecs
> [   59.845421] [drm] ring test on 6 succeeded in 3 usecs
> [   59.845429] [drm] ring test on 7 succeeded in 3 usecs
> [   59.845435] [drm] ring test on 8 succeeded in 2 usecs
> [   59.845493] [drm] ring test on 9 succeeded in 6 usecs
> [   59.845500] [drm] ring test on 10 succeeded in 6 usecs
> [   59.872362] [drm] ring test on 11 succeeded in 2 usecs
> [   59.872363] [drm] UVD initialized successfully.
> root at zhoucm1-System-Product-Name:~# dmesg -c
> [   59.971707] [drm] ring test on 12 succeeded in 14 usecs
> [   59.971717] [drm] ring test on 13 succeeded in 3 usecs
> [   59.971717] [drm] VCE initialized successfully.
> [   60.070820] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
> test timed out.
> [   60.071051] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
> failed testing IB on GFX ring (-110).
> [   60.071252] amdgpu 0000:03:00.0: ib ring test failed (-110).
> [   60.071375] pp_set_clockgating_state was not implemented.
> [   60.151397] amdgpu 0000:03:00.0: GPU pci config reset
> [   60.230844] amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
> [   60.232044] [drm] PCIE GART of 4096M enabled (table at 
> 0x0000000000040000).
> [   60.241412] current thermal is out of range
> [   60.244746] [drm] ring test on 0 succeeded in 13 usecs
> [   60.245322] [drm] ring test on 1 succeeded in 27 usecs
> [   60.245368] [drm] ring test on 2 succeeded in 23 usecs
> [   60.245380] [drm] ring test on 3 succeeded in 5 usecs
> [   60.245386] [drm] ring test on 4 succeeded in 2 usecs
> [   60.245393] [drm] ring test on 5 succeeded in 2 usecs
> [   60.245402] [drm] ring test on 6 succeeded in 3 usecs
>
>>
>> Regards,
>> Christian.
>>
>> Am 26.07.2016 um 09:28 schrieb zhoucm1:
>>> CQE has found timeout when they cherry-pick your timeout patch.
>>> I also found ib_test could be timeout after gpu reset.
>>> 1s maybe too long for a simple testing command, but I just think 
>>> that doesn't matter to judge timeout.
>>>
>>> Regards,
>>> David
>>>
>>> On 2016年07月26日 15:24, Christian König wrote:
>>>> Am 26.07.2016 um 07:57 schrieb Chunming Zhou:
>>>>> we should give enough time to IB test.
>>>>>
>>>>> Change-Id: I92bfbe9b3aa35083f41baed8663907abfa15c8e6
>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>>
>>>> Do we really need more than 10ms for an IB test? A whole second 
>>>> sounds awful long when we need to do 10+ tests currently.
>>>>
>>>> Christian.
>>>>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> index 050062e..a31d7ef 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> @@ -33,7 +33,7 @@
>>>>>   #include "amdgpu.h"
>>>>>   #include "atom.h"
>>>>>   -#define AMDGPU_IB_TEST_TIMEOUT    msecs_to_jiffies(10)
>>>>> +#define AMDGPU_IB_TEST_TIMEOUT    msecs_to_jiffies(1000)
>>>>>     /*
>>>>>    * IB
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160726/776fcf72/attachment.html>


More information about the amd-gfx mailing list