Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

Luís Mendes luis.p.mendes at gmail.com
Sat Oct 20 17:58:10 UTC 2018


The problems remains with Linux 4.18 and Linux 4.19 kernels. I am unable to
use AMD RX 460 and AMD RX 550 on my x64 Linux platforms.

I've installed Windows 10 in the same machine along with
win10-64bit-radeon-software-adrenalin-edition-18.10.1-oct18.exe and under
Windows the same RX 460 card *works fine* and I am able to run OpenCL
applications.

The driver is hanging since kernel 4.15, I am getting:
[   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:42:crtc-0] flip_done timed out
[   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
out
[   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
out
[   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
out
[   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
amdgpu_dm_commit_planes: acrtc 0, already busy

And after commit: drm/amdgpu: defer test IBs on the rings at boot (V3)

2c773de2ecb8c327f2448bd1eecad224e9227087
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.18-rc4&id=2c773de2ecb8c327f2448bd1eecad224e9227087>
I get with kernels 4.18 and 4.19 as well as Ubuntu 18.10 stock kernel
(can't even install Ubuntu 18.10 because it hangs with amdgpu):
[drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[drm_amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on
GFX ring (-110).
[drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test
failed (-110).
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last
signalled seq=0, last emitted seq=1
and the kernel blocks indefinitely:
task plymouthd:449 blocked for more than 120 seconds.

Is there any hope on getting this fixed?


On Thu, Jul 12, 2018 at 2:56 PM Luís Mendes <luis.p.mendes at gmail.com> wrote:

> Hi Jim,
>
> Replies in between.
>
> Regards,
> Luís
>
> On Thu, Jul 12, 2018 at 3:16 AM, jimqu <jimqu at amd.com> wrote:
>
>>
>>
>> On 2018年07月12日 05:27, Luís Mendes wrote:
>>
>> Hi Jim,
>>
>> I followed your suggestion and was able to bisect the kernel patches.
>> The offending patch is: drm/amdgpu: defer test IBs on the rings at boot
>> (V3)
>> commit:
>>
>> 2c773de2ecb8c327f2448bd1eecad224e9227087
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.18-rc4&id=2c773de2ecb8c327f2448bd1eecad224e9227087>
>>
>> After reverting this patch the IB test succeeded with kernel v4.18-rc4 on
>> both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
>> RX550 4GB and on SAPPHIRE RX460 2GB.
>>
>>
>> Alex, Christian, What do you think about the patch?
>>
>> The GPU hang remains, however.
>>  I will try to configure a remote IPMI connection to see what is
>> happening with the kernel boot or setup a serial console for the Kernel.
>>
>>
>> *You can set up remote connection by ssh, and also you can add amdgpu to
>> blacklist first, and manually modprobe amdgpu.*
>>
> R: I was able to setup a remote serial console with console=ttyS0,11520n8
> kernel parameter.
> Boot log follows attached as file kernel_bisected_v4.18-rc4_log.txt.
> First noticeable issue seems to be:
> [    6.131989] amdgpu: [powerplay]
> [    6.131989]  last message was failed ret is 65535
> ...
> and later hangs with:
> [   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
> *ERROR* [CRTC:42:crtc-0] flip_done timed out
> [   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
> out
> [   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
> out
> [   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
> out
> [   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
> amdgpu_dm_commit_planes: acrtc 0, already
> busy
>
>
>
>> *What's about xinit? What is MESA driver version on your platform?*
>>
> R: I am running Ubuntu 18.04 with bisected kernel 4.18-rc4 using
> libdrm-2.4.92 and mesa-18.1.0.
> xinit output follows attached as xinit_log.txt
>
>
>>
>> Thanks & Regards,
>> Luís
>>
>> On Wed, Jul 11, 2018 at 10:56 AM, jimqu <jimqu at amd.com> wrote:
>>
>>> HI Luis,
>>>
>>>
>>> Let us trace the issue one by one.
>>>
>>>
>>> IB test fail:
>>>
>>> This should be regression issue on 4.18, you can bisect the kernel
>>> patches.
>>>
>>> GPU hang:
>>>
>>> Fix IB test fail first.
>>>
>>>
>>> Thanks
>>>
>>> JimQu
>>>
>>>
>>>
>>> On 2018年07月11日 17:34, Luís Mendes wrote:
>>>
>>> Hi Jim,
>>>
>>> Thanks for your interest in this issue. Actually this is a multiple
>>> issue... not only the IB ring test is failing... as I am having quite some
>>> trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and SAPPHIRE
>>> RX 460 2GB on a TYAN S7002 to work, both systems using same Ubuntu 18.04
>>> with vanilla kernel.
>>>
>>> *1. May you also test earlier kernel? v4.17 or v4.16.*
>>> I've tested kernels v4.17.5 and v4.16.6 with same system and both are
>>> able to pass the IB ring test and system boots into X using NVIDIA as the
>>> display connected card.
>>> dmesg log attached for kernel 4.17.5, file
>>> TYAN_S7025_kernelv4.17.5_amdgpu_IB_ring_test_OK.txt.
>>>
>>> *2. May you test the issue only with amdgpu?*
>>> - I've tested on a TYAN S7002 system with a single SAPPHIRE RX 460 2GB,
>>> on-board VGA enabled and used as primary display.
>>> Kernel v4.18-rc4 fails the IB ring test, system is able to enter X
>>> through the on-board VGA.
>>> dmesg log attached for kernel 4.18-rc4, file
>>> TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.
>>>
>>> - Same TYAN S7002 system, but now with on-board VGA disabled and using
>>> RX 460 as display connected card.
>>> Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test, but GPU
>>> hangs before entering X. Don't have logs for these yet.
>>>
>>> Regards,
>>> Luís Mendes
>>> Aparapi contributor and MSc Researcher
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim <Jim.Qu at amd.com> wrote:
>>>
>>>> Hi Luis,
>>>>
>>>> 1. May you also test earlier kernel? v4.17 or v4.16.
>>>> 2. May you test the issue only with amdgpu?
>>>>
>>>> Thanks
>>>> JimQu
>>>>
>>>> ________________________________________
>>>> 发件人: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> 代表 Luís Mendes <
>>>> luis.p.mendes at gmail.com>
>>>> 发送时间: 2018年7月11日 6:04:00
>>>> 收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
>>>> 主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on
>>>> power-up
>>>>
>>>> Hi,
>>>>
>>>> Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.
>>>>
>>>> Logs follow attached.
>>>>
>>>> Regards,
>>>> Luis
>>>>
>>>> On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes <luis.p.mendes at gmail.com
>>>> <mailto:luis.p.mendes at gmail.com>> wrote:
>>>> Hi,
>>>>
>>>> I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050 Ti and an
>>>> AMD RX 550 4GB and the RX 550 card is failing the IB ring test.
>>>>
>>>> [    5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: ib
>>>> test failed (scratch(0xC040)=0xFFFFFFFF)
>>>> [    5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
>>>> failed testing IB on ring 6 (-22).
>>>>
>>>> Please see the attached log.
>>>>
>>>> Regards,
>>>> Luís
>>>>
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20181020/338153f8/attachment-0001.html>


More information about the amd-gfx mailing list