[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

Michel Dänzer michel at daenzer.net
Mon Mar 16 19:31:00 PDT 2015


On 16.03.2015 23:52, Carsten Emde wrote:
> Hi Michel,
> 
>>> [..]
>>> The most striking problem of kernel 3.18.9-rt4 affects all systems that
>>> are equipped with Radeon graphics (irrespective whether PCIe cards or
>>> APUs with on-chip graphics). They suffer from a hanging radeon driver.
>>> The block occurs when accelerated graphics load is created by x11perf or
>>> gltestperf. Sometimes only the graphics are frozen while ssh login still
>>> is possible, somtimes the entire box is no longer accessible at all. In
>>> any case, a reboot is needed to recover from this situation.
>>>
>>> Here is a selection of kernel messages:
>> [...]
>> The commits from
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=f957063fee6392bb9365370db6db74dc0b2dce0a
>>
>> to
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=cffefd9bb31cd35ab745d3b49005d10616d25bdc
>>
>> and
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=b6610101718d4ab90d793c482625e98eb1262cad
>>
>> might help for this.
> 
> Thanks a lot. I have applied these patches to a number of systems:
> # quilt applied | tail -7
> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch
> 
> 
>  The graphic boards still crash and freeze the screen, but in contrast
> to the earlier situation the systems remain accessible, and the X
> Window server can be restarted after the offensive programs are
> removed. The crashes were reliably triggered by
> - gltestperf
>   or
> - x11perf -repeat 3 -subs 25 -time 2 -rect10
> but the crashes also occur several times per day during normal work
> such as browsing the Internet or writing a text document. If you wish
> me to provide additional diagnostic information such as running test
> programs while the graphic boards are unresponsive, I certainly can do
> that.

Does it also happen with a kernel built from a current drm-fixes tree?
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes

I might have missed other needed fixes.


> Rack #0/Slot #3 [AMD/ATI] RV730 XT [Radeon HD 4670]:
> 
> [21001.244036] INFO: task kworker/u24:6:267 blocked for more than 120 seconds.
> [21001.257773]       Not tainted 3.18.9-rt4 #27
> [21001.266284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [21001.281911] kworker/u24:6   D ffff88081ed8b340     0   267      2 0x10000000
> [21001.281937] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
> [21001.281940]  ffff880805d2fbe8 0000000000000046 ffff88081ed0c700 0000000000000000
> [21001.281941]  0000000000009000 000000000000c920 ffff8808112fb420 ffff880035254e30
> [21001.281943]  000000000000c280 000001000000c280 0000000000000003 ffff880035254e30
> [21001.281945] Call Trace:
> [21001.281950]  [<ffffffff81721ce4>] schedule+0x34/0xa0
> [21001.281953]  [<ffffffff8172425c>] schedule_timeout+0x22c/0x2d0
> [21001.281962]  [<ffffffffa0439a06>] ? radeon_fence_process+0x16/0x40 [radeon]
> [21001.281971]  [<ffffffffa0439a74>] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [21001.281979]  [<ffffffffa0439da7>] radeon_fence_wait_seq_timeout.constprop.8+0x2e7/0x340 [radeon]
> [21001.281982]  [<ffffffff81098be0>] ? __wake_up_sync+0x20/0x20
> [21001.281991]  [<ffffffffa043a106>] radeon_fence_wait+0x86/0xc0 [radeon]
> [21001.282000]  [<ffffffffa0447eec>] radeon_flip_work_func+0x15c/0x190 [radeon]
> [21001.282003]  [<ffffffff810709c4>] process_one_work+0x154/0x450
> [21001.282004]  [<ffffffff81070fbb>] worker_thread+0x6b/0x4d0
> [21001.282006]  [<ffffffff81070f50>] ? rescuer_thread+0x290/0x290
> [21001.282007]  [<ffffffff81070f50>] ? rescuer_thread+0x290/0x290
> [21001.282009]  [<ffffffff81075fed>] kthread+0xcd/0xf0
> [21001.282010]  [<ffffffff81075f20>] ? kthread_worker_fn+0x1d0/0x1d0
> [21001.282013]  [<ffffffff81725aec>] ret_from_fork+0x7c/0xb0
> [21001.282014]  [<ffffffff81075f20>] ? kthread_worker_fn+0x1d0/0x1d0
> 
> 
> Rack #0/Slot #7 [AMD/ATI] Cayman XT [Radeon HD 6970]
> 
> [  481.091132] INFO: task Xorg:3459 blocked for more than 120 seconds.
> [  481.103594]       Not tainted 3.18.9-rt4 #28
> [  481.112101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  481.127746] Xorg            D ffff88041e68ab40     0  3459   3452 0x10400004
> [  481.141882]  ffff880413da38e8 0000000000000002 ffff88041e60c460 ffff8800c3ea3380
> [  481.141882]  ffff880413da38d8 ffffffff8108603f 000000000000c5a8 000000000000c5c8
> [  481.141883]  ffffffff81c19460 ffff8800c3ea3380 000000000000000c ffff8800c3ea3380
> [  481.186228] Call Trace:
> [  481.191114]  [<ffffffff8108603f>] ? queue_delayed_work_on+0xff/0x110
> [  481.191118]  [<ffffffff816b50f4>] schedule+0x34/0xa0
> [  481.191119]  [<ffffffff816b72f4>] schedule_timeout+0x204/0x270
> [  481.191148]  [<ffffffffa00cd826>] ? radeon_fence_process+0x16/0x40 [radeon]
> [  481.191157]  [<ffffffffa00cd894>] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [  481.191165]  [<ffffffffa00cdb07>] radeon_fence_wait_seq_timeout.constprop.7+0x227/0x330 [radeon]
> [  481.191167]  [<ffffffff810ac310>] ? prepare_to_wait_event+0x110/0x110
> [  481.191175]  [<ffffffffa00cdf67>] radeon_fence_wait_any+0x57/0x70 [radeon]
> [  481.191191]  [<ffffffffa01432af>] radeon_sa_bo_new+0x2cf/0x4e0 [radeon]
> [  481.191194]  [<ffffffff8133c2a7>] ? debug_smp_processor_id+0x17/0x20
> [  481.191207]  [<ffffffffa019d3e7>] radeon_ib_get+0x37/0xf0 [radeon]
> [  481.191218]  [<ffffffffa00e997d>] radeon_cs_ioctl+0x22d/0x820 [radeon]
> [  481.191219]  [<ffffffff8133c2a7>] ? debug_smp_processor_id+0x17/0x20
> [  481.191228]  [<ffffffffa001bc04>] drm_ioctl+0x1a4/0x630 [drm]
> [  481.191231]  [<ffffffff8133c2a7>] ? debug_smp_processor_id+0x17/0x20
> [  481.191234]  [<ffffffff8106e8da>] ? unpin_current_cpu+0x1a/0x70
> [  481.191237]  [<ffffffff81097440>] ? migrate_enable+0xb0/0x1b0
> [  481.191243]  [<ffffffffa00b004b>] radeon_drm_ioctl+0x4b/0x80 [radeon]
> [  481.191245]  [<ffffffff811c7040>] do_vfs_ioctl+0x2e0/0x4d0
> [  481.191247]  [<ffffffff811d1aa2>] ? __fget+0x72/0xa0
> [  481.191248]  [<ffffffff811c72b1>] SyS_ioctl+0x81/0xa0
> [  481.191250]  [<ffffffff816b8cb2>] tracesys_phase2+0xd4/0xd9
> 
> 
> Rack #0/Slot #8 [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]:
> 
> [19579.220958] INFO: task Xorg.bin:16569 blocked for more than 120 seconds.
> [19579.228008]       Not tainted 3.18.9-rt4 #25
> [19579.232491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [19579.240719] Xorg.bin        D ffffffff81716c70     0 16569  16215 0x10400080
> [19579.248076]  ffff8805f78bf818 0000000000000002 ffff8805f78bf7f8 0000000000000002
> [19579.248077]  000000000000dc08 ffff880626a0dc08 000000000000dbe8 000000000000dc08
> [19579.248078]  ffffffff81c1b500 ffff880606c614a0 ffff880614f7c000 ffff880606c614a0
> [19579.271393] Call Trace:
> [19579.273964]  [<ffffffff81713da4>] schedule+0x34/0xa0
> [19579.273965]  [<ffffffff817162dc>] schedule_timeout+0x1fc/0x280
> [19579.273990]  [<ffffffffa00c7aa6>] ? radeon_fence_process+0x16/0x40 [radeon]
> [19579.273999]  [<ffffffffa00c7b14>] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [19579.274008]  [<ffffffffa00c7e47>] radeon_fence_wait_seq_timeout.constprop.8+0x2e7/0x340 [radeon]
> [19579.274011]  [<ffffffff810cf310>] ? __wake_up_sync+0x20/0x20
> [19579.274020]  [<ffffffffa00c8237>] radeon_fence_wait_any+0x57/0x70 [radeon]
> [19579.274035]  [<ffffffffa013e2cf>] radeon_sa_bo_new+0x2af/0x4b0 [radeon]
> [19579.274049]  [<ffffffffa0196077>] radeon_ib_get+0x37/0xe0 [radeon]
> [19579.274062]  [<ffffffffa0194bbc>] radeon_vm_update_page_directory+0x6c/0x290 [radeon]
> [19579.274078]  [<ffffffffa0144916>] ? si_ib_parse+0x396/0x430 [radeon]
> [19579.274089]  [<ffffffffa00e44ab>] radeon_cs_ioctl+0x35b/0x850 [radeon]
> [19579.274098]  [<ffffffffa0005bc7>] drm_ioctl+0x197/0x670 [drm]
> [19579.274102]  [<ffffffff81373337>] ? debug_smp_processor_id+0x17/0x20
> [19579.274103]  [<ffffffff8108ec2a>] ? unpin_current_cpu+0x1a/0x80
> [19579.274105]  [<ffffffff810b85c4>] ? migrate_enable+0x84/0x160
> [19579.274111]  [<ffffffffa00aa04c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
> [19579.274114]  [<ffffffff811f8ae8>] do_vfs_ioctl+0x2c8/0x4c0
> [19579.274116]  [<ffffffff81203902>] ? __fget+0x72/0xb0
> [19579.274117]  [<ffffffff811f8d61>] SyS_ioctl+0x81/0xa0
> [19579.274118]  [<ffffffff817179de>] tracesys_phase2+0xd4/0xd9
> 
> 
> Rack #4/Slot #1 Chipset: "KAVERI" (ChipID = 0x130c):
> 
> [21721.088164] INFO: task Xorg:7436 blocked for more than 120 seconds.
> [21721.100625]       Not tainted 3.18.9-rt4 #26
> [21721.109150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [21721.124795] Xorg            D ffffffff816b7f88     0  7436   7430 0x10400004
> [21721.138897]  ffff880409f278e8 0000000000000002 ffff88041e90c460 000000000000c5c8
> [21721.138898]  ffff88041e90c5c8 0000000000000006 000000000000c5a8 000000000000c5c8
> [21721.138899]  ffff8804177299c0 ffff880409f299c0 000000000000000c ffff880409f299c0
> [21721.183222] Call Trace:
> [21721.188110]  [<ffffffff816b50f4>] schedule+0x34/0xa0
> [21721.188112]  [<ffffffff816b72f4>] schedule_timeout+0x204/0x270
> [21721.188143]  [<ffffffffa00cd826>] ? radeon_fence_process+0x16/0x40 [radeon]
> [21721.188153]  [<ffffffffa00cd894>] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [21721.188163]  [<ffffffffa00cdb07>] radeon_fence_wait_seq_timeout.constprop.7+0x227/0x330 [radeon]
> [21721.188165]  [<ffffffff810ac310>] ? prepare_to_wait_event+0x110/0x110
> [21721.188176]  [<ffffffffa00cdf67>] radeon_fence_wait_any+0x57/0x70 [radeon]
> [21721.188193]  [<ffffffffa01432af>] radeon_sa_bo_new+0x2cf/0x4e0 [radeon]
> [21721.188196]  [<ffffffff8133c2a7>] ? debug_smp_processor_id+0x17/0x20
> [21721.188210]  [<ffffffffa019d3e7>] radeon_ib_get+0x37/0xf0 [radeon]
> [21721.188223]  [<ffffffffa00e997d>] radeon_cs_ioctl+0x22d/0x820 [radeon]
> [21721.188233]  [<ffffffffa001bc04>] drm_ioctl+0x1a4/0x630 [drm]
> [21721.188236]  [<ffffffff8133c2a7>] ? debug_smp_processor_id+0x17/0x20
> [21721.188238]  [<ffffffff8106e8da>] ? unpin_current_cpu+0x1a/0x70
> [21721.188240]  [<ffffffff81097440>] ? migrate_enable+0xb0/0x1b0
> [21721.188248]  [<ffffffffa00b004b>] radeon_drm_ioctl+0x4b/0x80 [radeon]
> [21721.188250]  [<ffffffff811c7040>] do_vfs_ioctl+0x2e0/0x4d0
> [21721.188252]  [<ffffffff811d1aa2>] ? __fget+0x72/0xa0
> [21721.188254]  [<ffffffff811c72b1>] SyS_ioctl+0x81/0xa0
> [21721.188255]  [<ffffffff816b8cb2>] tracesys_phase2+0xd4/0xd9
> 
> 
> Rack #c/Slot #5 Chipsed: "ATI Radeon HD 5800 Series" (ChipID = 0x6898)
> 
> [19711.965733] INFO: task kworker/u24:13:197 blocked for more than 120 seconds.
> [19711.965737]       Not tainted 3.18.9-rt4 #26
> [19711.965749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [19711.965751] kworker/u24:13  D ffff88032901a560     0   197      2 0x10000000
> [19711.965784] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
> [19711.965788]  ffff880328b3bc58 0000000000000002 000000000001d65e 0000000000000000
> [19711.965789]  ffff880328b3bfd8 000000000008a5c0 ffff880328b3bc78 ffffffffa0482589
> [19711.965791]  ffff88032fa81920 ffff880328b30000 ffff88032c63d5f0 ffff880328b30000
> [19711.965794] Call Trace:
> [19711.965813]  [<ffffffffa0482589>] ? radeon_fence_activity+0x160/0x172 [radeon]
> [19711.965818]  [<ffffffff814e0d38>] schedule+0x7e/0x90
> [19711.965820]  [<ffffffff814e2143>] schedule_timeout+0x25/0xd3
> [19711.965835]  [<ffffffffa0482ba3>] ? radeon_fence_any_seq_signaled+0x52/0x69 [radeon]
> [19711.965850]  [<ffffffffa0482d8d>] radeon_fence_wait_seq_timeout.constprop.6+0x1d3/0x2be [radeon]
> [19711.965853]  [<ffffffff81066166>] ? __wake_up_sync+0x12/0x12
> [19711.965869]  [<ffffffffa04830e1>] radeon_fence_wait+0x92/0xaa [radeon]
> [19711.965886]  [<ffffffffa048dae1>] radeon_flip_work_func+0x11e/0x14f [radeon]
> [19711.965889]  [<ffffffff8104cac1>] process_one_work+0x16e/0x2ae
> [19711.965891]  [<ffffffff8104d0fe>] worker_thread+0x1df/0x2ca
> [19711.965892]  [<ffffffff8104cf1f>] ? cancel_delayed_work+0x91/0x91
> [19711.965894]  [<ffffffff8104cf1f>] ? cancel_delayed_work+0x91/0x91
> [19711.965895]  [<ffffffff81051324>] kthread+0xae/0xb6
> [19711.965897]  [<ffffffff81051276>] ? __kthread_parkme+0x61/0x61
> [19711.965899]  [<ffffffff814e322c>] ret_from_fork+0x7c/0xb0
> [19711.965901]  [<ffffffff81051276>] ? __kthread_parkme+0x61/0x61
> [19711.965916] INFO: task compiz:2626 blocked for more than 120 seconds.
> [19711.965929]       Not tainted 3.18.9-rt4 #26
> [19711.965931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [19711.965932] compiz          D ffff88032901a560     0  2626   2186 0x30020000
> [19711.965937]  ffff8800b8ee7bc8 0000000000200002 ffff88032bb9e480 0000000000000000
> [19711.965942]  ffff8800b8ee7fd8 000000000008a5c0 0000000000000000 ffff8800b8ee7ee0
> [19711.965951]  ffffffff81a25450 ffff88032bb9e480 ffff8800b8ee7c28 ffff88032bb9e480
> [19711.965954] Call Trace:
> [19711.965958]  [<ffffffff814e0d38>] schedule+0x7e/0x90
> [19711.965959]  [<ffffffff814e1ab7>] __rt_mutex_slowlock+0x9f/0xdc
> [19711.965961]  [<ffffffff814e1f7b>] rt_mutex_slowlock+0x123/0x236
> [19711.965964]  [<ffffffff8106b234>] rt_mutex_fastlock.constprop.24+0x2e/0x30
> [19711.965965]  [<ffffffff814e2103>] rt_mutex_lock+0x13/0x15
> [19711.965967]  [<ffffffff8106b613>] __rt_down_read.isra.1+0x29/0x30
> [19711.965968]  [<ffffffff8106b628>] rt_down_read+0xe/0x10
> [19711.965988]  [<ffffffffa04942ff>] radeon_gem_create_ioctl+0x2c/0xc6 [radeon]
> [19711.965990]  [<ffffffff812004f9>] ? avc_has_perm_noaudit+0xf7/0x109
> [19711.966004]  [<ffffffffa010bc26>] drm_ioctl+0x380/0x3f8 [drm]
> [19711.966025]  [<ffffffffa04942d3>] ? radeon_gem_pwrite_ioctl+0x28/0x28 [radeon]
> [19711.966027]  [<ffffffff81200ca6>] ? inode_has_perm+0x2f/0x34
> [19711.966029]  [<ffffffff81200e58>] ? file_has_perm+0x5d/0x81
> [19711.966040]  [<ffffffffa046e00e>] radeon_drm_ioctl+0xe/0x10 [radeon]
> [19711.966067]  [<ffffffffa0518b9c>] radeon_kms_compat_ioctl+0x1b/0x1f [radeon]
> [19711.966070]  [<ffffffff8115e692>] compat_SyS_ioctl+0x1c3/0xf6e
> [19711.966072]  [<ffffffff8100e7b1>] ? syscall_trace_enter+0x52/0x57
> [19711.966074]  [<ffffffff814e5679>] ia32_do_call+0x13/0x13 



-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list