Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

Luís Mendes luis.p.mendes at gmail.com
Fri Feb 2 18:46:00 UTC 2018


Hi Christian, Alexander,

I have enabled kmemleak, but memleak didn't detect anything special,
in fact this time, I don't know why, I didn't get any allocation
failure at all, but the GPU did hang after around 4h 6m of uptime with
Xorg.
The log can be found in attachment. I will try again to see if the
allocation failure reappears, or if it has become less apparent due to
kmemleak scans.

The kernel stack trace is similar to the GPU hangs I was getting on
earlier kernel versions with Kodi, or Firefox when watching videos
with either one, but if I left Xorg idle, it would remain up and
available without hanging for more than one day.
This stack trace also looks quite similar to what Daniel Andersson
reported in "[BUG] Intermittent hang/deadlock when opening browser tab
with Vega gpu", looks like another demonstration of the same bug on
different architectures.

Regards,
Luís

On Fri, Feb 2, 2018 at 7:48 AM, Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
> Hi Luis,
>
> please enable kmemleak in your build and watch out for any suspicious
> messages in the system log.
>
> Regards,
> Christian.
>
>
> Am 02.02.2018 um 00:03 schrieb Luís Mendes:
>>
>> Hi Alexander,
>>
>> I didn't notice improvements on this issue with that particular patch
>> applied. It still ends up failing to allocate kernel memory after a
>> few hours of uptime with Xorg.
>>
>> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
>> head, to see if the issue still occurs with those versions.
>>
>> If you have additional suggestions I'll be happy to try them.
>>
>> Regards,
>> Luís Mendes
>>
>> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher at gmail.com>
>> wrote:
>>>
>>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes at gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>>> play any videos with Kodi or play web videos with firefox and run
>>>> OpenGL applications without running into any issues, however after
>>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>>> allocation failure, normally followed by a GPU hang a while after the
>>>> the allocation failure.
>>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>>> code when I got the kernel messages that can be found in attachment.
>>>>
>>>> I am using the kernel as identified on my previous email, which can be
>>>> found below.
>>>
>>> does this patch help?
>>> https://patchwork.freedesktop.org/patch/198258/
>>>
>>> Alex
>>>
>>>> Regards,
>>>> Luís Mendes
>>>>
>>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> I've cherry picked the patch you pointed out into kernel from
>>>>> amd-drm-next-4.17-wip at commit
>>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>>> gone indeed.
>>>>>
>>>>>
>>>>> Working great on ARMv7l with AMD RX460.
>>>>>
>>>>> Thanks,
>>>>> Luís Mendes
>>>>>
>>>>>
>>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>>> <Alexander.Deucher at amd.com> wrote:
>>>>>>
>>>>>> Fixed with this patch:
>>>>>>
>>>>>>
>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>>
>>>>>>
>>>>>> Alex
>>>>
>>>> <>
>>>>>>
>>>>>> __________________
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
-------------- next part --------------
Feb  2 16:29:29 localhost kernel: [14801.740467] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=831006, last emitted seq=831008
Feb  2 16:29:29 localhost kernel: [14801.751557] [drm] IP block:gmc_v8_0 is hung!
Feb  2 16:29:29 localhost kernel: [14801.751563] [drm] IP block:gfx_v8_0 is hung!
Feb  2 16:29:29 localhost kernel: [14801.751611] [drm] GPU recovery disabled.
Feb  2 16:44:53 localhost kernel: [15725.856181] INFO: task amdgpu_cs:0:3803 blocked for more than 120 seconds.
Feb  2 16:44:53 localhost kernel: [15725.863085]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  2 16:44:53 localhost kernel: [15725.869213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 16:44:53 localhost kernel: [15725.877078] amdgpu_cs:0     D    0  3803   3091 0x00000000
Feb  2 16:44:53 localhost kernel: [15725.877084] Backtrace: 
Feb  2 16:44:53 localhost kernel: [15725.877096] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  2 16:44:53 localhost kernel: [15725.877102]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877104]  r4:ffffe000
Feb  2 16:44:53 localhost kernel: [15725.877110] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  2 16:44:53 localhost kernel: [15725.877112]  r5:81004c48 r4:7fffffff
Feb  2 16:44:53 localhost kernel: [15725.877121] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  2 16:44:53 localhost kernel: [15725.877125]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877127]  r4:9de12280
Feb  2 16:44:53 localhost kernel: [15725.877132] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  2 16:44:53 localhost kernel: [15725.877137]  r10:b4593800 r9:bbfd8000 r8:00000001 r7:a17b3768 r6:00000000 r5:9de12280
Feb  2 16:44:53 localhost kernel: [15725.877138]  r4:81096c18
Feb  2 16:44:53 localhost kernel: [15725.877342] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877346]  r7:a17b3768 r6:00000001 r5:b341e6c0 r4:00000001
Feb  2 16:44:53 localhost kernel: [15725.877606] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877609]  r5:b341e6c0 r4:00000001
Feb  2 16:44:53 localhost kernel: [15725.877773] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f08b920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  2 16:44:53 localhost kernel: [15725.877778]  r10:00000018 r9:b45b7e2c r8:7f19e358 r7:00000021 r6:00000000 r5:bbff0000
Feb  2 16:44:53 localhost kernel: [15725.877779]  r4:be9d30c0
Feb  2 16:44:53 localhost kernel: [15725.877821] [<7f08b8b8>] (drm_ioctl_kernel [drm]) from [<7f08bdec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  2 16:44:53 localhost kernel: [15725.877825]  r9:00000044 r8:c0186444 r7:be9d30c0 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877971] [<7f08bb20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877976]  r10:bb5bf210 r9:b45b6000 r8:7322aaa0 r7:0000000c r6:b0694d80 r5:7322aaa0
Feb  2 16:44:53 localhost kernel: [15725.877977]  r4:81004c48
Feb  2 16:44:53 localhost kernel: [15725.878103] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  2 16:44:53 localhost kernel: [15725.878108] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  2 16:44:53 localhost kernel: [15725.878112]  r10:00000000 r9:b45b6000 r8:7322aaa0 r7:c0186444 r6:0000000c r5:b0694d80
Feb  2 16:44:53 localhost kernel: [15725.878114]  r4:b0694d81
Feb  2 16:44:53 localhost kernel: [15725.878121] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  2 16:44:53 localhost kernel: [15725.878125]  r9:b45b6000 r8:801090e4 r7:00000036 r6:c0186444 r5:7322aaa0 r4:c0006400
Feb  2 16:46:56 localhost kernel: [15848.730505] INFO: task amdgpu_cs:0:3803 blocked for more than 120 seconds.
Feb  2 16:46:56 localhost kernel: [15848.737413]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  2 16:46:56 localhost kernel: [15848.743541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 16:46:56 localhost kernel: [15848.751404] amdgpu_cs:0     D    0  3803   3091 0x00000000
Feb  2 16:46:56 localhost kernel: [15848.751410] Backtrace: 
Feb  2 16:46:56 localhost kernel: [15848.751421] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  2 16:46:56 localhost kernel: [15848.751426]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:46:56 localhost kernel: [15848.751428]  r4:ffffe000
Feb  2 16:46:56 localhost kernel: [15848.751434] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  2 16:46:56 localhost kernel: [15848.751436]  r5:81004c48 r4:7fffffff
Feb  2 16:46:56 localhost kernel: [15848.751444] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  2 16:46:56 localhost kernel: [15848.751449]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:46:56 localhost kernel: [15848.751451]  r4:9de12280
Feb  2 16:46:56 localhost kernel: [15848.751456] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  2 16:46:56 localhost kernel: [15848.751460]  r10:b4593800 r9:bbfd8000 r8:00000001 r7:a17b3768 r6:00000000 r5:9de12280
Feb  2 16:46:56 localhost kernel: [15848.751462]  r4:81096c18
Feb  2 16:46:56 localhost kernel: [15848.751667] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.751671]  r7:a17b3768 r6:00000001 r5:b341e6c0 r4:00000001
Feb  2 16:46:56 localhost kernel: [15848.751930] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.751933]  r5:b341e6c0 r4:00000001
Feb  2 16:46:56 localhost kernel: [15848.752098] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f08b920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  2 16:46:56 localhost kernel: [15848.752103]  r10:00000018 r9:b45b7e2c r8:7f19e358 r7:00000021 r6:00000000 r5:bbff0000
Feb  2 16:46:56 localhost kernel: [15848.752105]  r4:be9d30c0
Feb  2 16:46:56 localhost kernel: [15848.752144] [<7f08b8b8>] (drm_ioctl_kernel [drm]) from [<7f08bdec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  2 16:46:56 localhost kernel: [15848.752148]  r9:00000044 r8:c0186444 r7:be9d30c0 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  2 16:46:56 localhost kernel: [15848.752295] [<7f08bb20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.752299]  r10:bb5bf210 r9:b45b6000 r8:7322aaa0 r7:0000000c r6:b0694d80 r5:7322aaa0
Feb  2 16:46:56 localhost kernel: [15848.752301]  r4:81004c48
Feb  2 16:46:56 localhost kernel: [15848.752426] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  2 16:46:56 localhost kernel: [15848.752432] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  2 16:46:56 localhost kernel: [15848.752436]  r10:00000000 r9:b45b6000 r8:7322aaa0 r7:c0186444 r6:0000000c r5:b0694d80
Feb  2 16:46:56 localhost kernel: [15848.752438]  r4:b0694d81
Feb  2 16:46:56 localhost kernel: [15848.752445] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  2 16:46:56 localhost kernel: [15848.752449]  r9:b45b6000 r8:801090e4 r7:00000036 r6:c0186444 r5:7322aaa0 r4:c0006400


More information about the amd-gfx mailing list