GPU lockup CP stall for more than 10000msec on latest vanilla git
Markus Trippelsdorf
markus at trippelsdorf.de
Tue Dec 18 05:38:31 PST 2012
On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote:
> > On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> > > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> > > > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> > > > <markus at trippelsdorf.de> wrote:
> > > > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > > > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> > > > >> <markus at trippelsdorf.de> wrote:
> > > > >> > As soon as I open the following website:
> > > > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > > > >> >
> > > > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > > > >>
> > > > >> Is this a regression? Most likely a 3D driver bug unless you are only
> > > > >> seeing it with specific kernels. What browser are you using and do
> > > > >> you have hw accelerated webgl, etc. enabled? If so, what version of
> > > > >> mesa are you using?
> > > > >
> > > > > This is a regression, because it is caused by yesterdays merge of
> > > > > drm-next by Linus. IOW I only see this bug when running a
> > > > > v3.7-9432-g9360b53 kernel.
> > > >
> > > > Can you bisect? I'm guessing it may be related to the new DMA rings. Possibly:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> > >
> > > Yes, the commit above causes the issue.
> > >
> > > 2d6cc72 GPU lockups
> >
> > With 2d6cc72 reverted I get:
> >
> > Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------
>
> Probably a separate issue, can you bisect this one as well?
Yes. Git-bisect points to:
85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
Author: Maarten Lankhorst <maarten.lankhorst at canonical.com>
Date: Thu Nov 29 11:36:54 2012 +0000
drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
held, v3
(Please note that this bug is a little bit harder to reproduce. But
when you scroll up and down for ~10 seconds on the webpage mentioned
above it will trigger the oops.
So while I'm not 100% sure that the issue is caused by exactly this
commit, the vicinity should be right)
Dec 18 14:29:07 x4 kernel: ------------[ cut here ]------------
Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 radeon_fence_ref+0x2c/0x40()
Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 3.7.0-rc7-00520-g85b144f #168
Dec 18 14:29:07 x4 kernel: Call Trace:
Dec 18 14:29:07 x4 kernel: [<ffffffff81058c84>] ? warn_slowpath_common+0x74/0xb0
Dec 18 14:29:07 x4 kernel: [<ffffffff812926fc>] ? radeon_fence_ref+0x2c/0x40
Dec 18 14:29:07 x4 kernel: [<ffffffff8125e91c>] ? ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
Dec 18 14:29:07 x4 kernel: [<ffffffff8125f13c>] ? ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 14:29:07 x4 kernel: [<ffffffff81264412>] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 14:29:07 x4 kernel: [<ffffffff8125f48e>] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 14:29:07 x4 kernel: [<ffffffff8125facc>] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 14:29:07 x4 kernel: [<ffffffff810de172>] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 14:29:07 x4 kernel: [<ffffffff8125fbd5>] ? ttm_bo_validate+0x95/0x110
Dec 18 14:29:07 x4 kernel: [<ffffffff8125ff3c>] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 14:29:07 x4 kernel: [<ffffffff8129415a>] ? radeon_bo_create+0x18a/0x200
Dec 18 14:29:07 x4 kernel: [<ffffffff81293e40>] ? radeon_bo_clear_va+0x40/0x40
Dec 18 14:29:07 x4 kernel: [<ffffffff812a5302>] ? radeon_gem_object_create+0x92/0x160
Dec 18 14:29:07 x4 kernel: [<ffffffff812a571c>] ? radeon_gem_create_ioctl+0x6c/0x150
Dec 18 14:29:07 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
Dec 18 14:29:07 x4 kernel: [<ffffffff812a56b0>] ? radeon_gem_pwrite_ioctl+0x20/0x20
Dec 18 14:29:07 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 14:29:07 x4 kernel: [<ffffffff810e5588>] ? vfs_read+0x118/0x160
Dec 18 14:29:07 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
Dec 18 14:29:07 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
Dec 18 14:29:07 x4 kernel: [<ffffffff814b05d2>] ? system_call_fastpath+0x16/0x1b
Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at 0000000100000077
Dec 18 14:29:07 x4 kernel: IP: [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0
Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP
Dec 18 14:29:07 x4 kernel: CPU 1
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: G W 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
Dec 18 14:29:07 x4 kernel: RIP: 0010:[<ffffffff814afa15>] [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: RSP: 0018:ffff880211645d58 EFLAGS: 00010286
Dec 18 14:29:07 x4 kernel: RAX: 0000000000000100 RBX: ffff8801c0e29448 RCX: 0000000000000000
Dec 18 14:29:07 x4 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000100000077
Dec 18 14:29:07 x4 kernel: RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffffff81838370
Dec 18 14:29:07 x4 kernel: R10: ffffffff812a5960 R11: 0000000000000246 R12: 0000000000000001
Dec 18 14:29:07 x4 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 00007fff0723dba0
Dec 18 14:29:07 x4 kernel: FS: 00007f958542f880(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Dec 18 14:29:07 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077 CR3: 000000021161a000 CR4: 00000000000007e0
Dec 18 14:29:07 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 18 14:29:07 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 18 14:29:07 x4 kernel: Process X (pid: 161, threadinfo ffff880211644000, task ffff880215ab85d0)
Dec 18 14:29:07 x4 kernel: Stack:
Dec 18 14:29:07 x4 kernel: ffffffff8125d9ba 0000000015c83600 ffff8801c0e29400 ffff880211645e30
Dec 18 14:29:07 x4 kernel: ffff8801c0e29448 ffff880211645dcc 0000000000000001 ffffffff81294bff
Dec 18 14:29:07 x4 kernel: ffff8801c0e29608 ffff880211645e30 ffff880216a76000 ffff880211645e30
Dec 18 14:29:07 x4 kernel: Call Trace:
Dec 18 14:29:07 x4 kernel: [<ffffffff8125d9ba>] ? ttm_bo_reserve+0x3a/0x110
Dec 18 14:29:07 x4 kernel: [<ffffffff81294bff>] ? radeon_bo_wait+0x3f/0xc0
Dec 18 14:29:07 x4 kernel: [<ffffffff812a59b7>] ? radeon_gem_busy_ioctl+0x57/0x100
Dec 18 14:29:07 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
Dec 18 14:29:07 x4 kernel: [<ffffffff812a5960>] ? radeon_gem_mmap_ioctl+0x20/0x20
Dec 18 14:29:07 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 14:29:07 x4 kernel: [<ffffffff810e55ad>] ? vfs_read+0x13d/0x160
Dec 18 14:29:07 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
Dec 18 14:29:07 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
Dec 18 14:29:07 x4 kernel: [<ffffffff814b05d2>] ? system_call_fastpath+0x16/0x1b
Dec 18 14:29:07 x4 kernel: Code: 31 c0 5b c3 66 90 8d 8a 00 01 00 00 89 d0 f0 66 0f b1 0b 66 39 d0 75 de b8 01 00 00 00 5b c3 0f 1f 80 00 00 00 00 b8 00 01 00 00 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 10 0f 1f 80 00 00 00 00 f3 90
Dec 18 14:29:07 x4 kernel: RIP [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: RSP <ffff880211645d58>
Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077
Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70c ]---
Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at 0000000100000023
Dec 18 14:29:28 x4 kernel: IP: [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 14:29:28 x4 kernel: PGD 205289067 PUD 0
Dec 18 14:29:28 x4 kernel: Oops: 0002 [#2] SMP
Dec 18 14:29:28 x4 kernel: CPU 1
Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G D W 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
Dec 18 14:29:28 x4 kernel: RIP: 0010:[<ffffffff81296448>] [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3d78 EFLAGS: 00010207
Dec 18 14:29:28 x4 kernel: RAX: 00000000ffffffff RBX: ffff8801c0e29048 RCX: ffff8801c0e2b928
Dec 18 14:29:28 x4 kernel: RDX: 0000000000000001 RSI: ffff8801c0e291f0 RDI: 00000000ffffffff
Dec 18 14:29:28 x4 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Dec 18 14:29:28 x4 kernel: R10: ffffea0007038a00 R11: dead000000100100 R12: ffff880216a76590
Dec 18 14:29:28 x4 kernel: R13: ffffffff818383e0 R14: 0000000000000000 R15: ffff880215c83678
Dec 18 14:29:28 x4 kernel: FS: 00007f4bb2b64740(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Dec 18 14:29:28 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023 CR3: 000000020698f000 CR4: 00000000000007e0
Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8802168b2000, task ffff88021687d730)
Dec 18 14:29:28 x4 kernel: Stack:
Dec 18 14:29:28 x4 kernel: ffffffff8125d2e9 ffff8801c0e29048 ffffffff8125e8cb ffff880216a769b8
Dec 18 14:29:28 x4 kernel: ffffffff810de82f ffff8801c0e2b848 ffff880215c83678 ffff8801c0e2b900
Dec 18 14:29:28 x4 kernel: 0000000000000001 ffff880216a76a80 ffff8801c0e29048 ffffffff8125eb7d
Dec 18 14:29:28 x4 kernel: Call Trace:
Dec 18 14:29:28 x4 kernel: [<ffffffff8125d2e9>] ? ttm_bo_cleanup_memtype_use+0x19/0x90
Dec 18 14:29:28 x4 kernel: [<ffffffff8125e8cb>] ? ttm_bo_cleanup_refs_and_unlock+0x12b/0x2c0
Dec 18 14:29:28 x4 kernel: [<ffffffff810de82f>] ? kfree+0xf/0xb0
Dec 18 14:29:28 x4 kernel: [<ffffffff8125eb7d>] ? ttm_bo_delayed_delete+0x11d/0x1a0
Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec12>] ? ttm_bo_delayed_workqueue+0x12/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff8106e5f9>] ? process_one_work+0x179/0x480
Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec00>] ? ttm_bo_delayed_delete+0x1a0/0x1a0
Dec 18 14:29:28 x4 kernel: [<ffffffff8106f5b1>] ? worker_thread+0x1b1/0x540
Dec 18 14:29:28 x4 kernel: [<ffffffff8106f400>] ? busy_worker_rebind_fn+0x100/0x100
Dec 18 14:29:28 x4 kernel: [<ffffffff810741cf>] ? kthread+0xaf/0xc0
Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff814b052c>] ? ret_from_fork+0x7c/0xb0
Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
Dec 18 14:29:28 x4 kernel: Code: 8b 44 24 04 48 83 c4 08 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 86 f0 01 00 00 48 81 c6 f0 01 00 00 48 39 f0 74 11 0f 1f 44 00 00 <c6> 40 24 00 48 8b 00 48 39 f0 75 f4 f3 c3 66 2e 0f 1f 84 00 00
Dec 18 14:29:28 x4 kernel: RIP [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 14:29:28 x4 kernel: RSP <ffff8802168b3d78>
Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023
Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70d ]---
Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
Dec 18 14:29:28 x4 kernel: IP: [<ffffffff81074257>] kthread_data+0x7/0x10
Dec 18 14:29:28 x4 kernel: PGD 180d067 PUD 180e067 PMD 0
Dec 18 14:29:28 x4 kernel: Oops: 0000 [#3] SMP
Dec 18 14:29:28 x4 kernel: CPU 1
Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G D W 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
Dec 18 14:29:28 x4 kernel: RIP: 0010:[<ffffffff81074257>] [<ffffffff81074257>] kthread_data+0x7/0x10
Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3aa0 EFLAGS: 00010002
Dec 18 14:29:28 x4 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000015c7992d1
Dec 18 14:29:28 x4 kernel: RDX: ffffffffff8a8b63 RSI: 0000000000000001 RDI: ffff88021687d730
Dec 18 14:29:28 x4 kernel: RBP: ffff88021687d730 R08: 0000000000000000 R09: 0000000000000000
Dec 18 14:29:28 x4 kernel: R10: ffff880216887980 R11: 0000000000000000 R12: ffff88021fc912c0
Dec 18 14:29:28 x4 kernel: R13: 0000000000000001 R14: ffff88021687d720 R15: ffff88021687d730
Dec 18 14:29:28 x4 kernel: FS: 00007f4bb2b64740(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Dec 18 14:29:28 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8 CR3: 000000020698f000 CR4: 00000000000007e0
Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8802168b2000, task ffff88021687d730)
Dec 18 14:29:28 x4 kernel: Stack:
Dec 18 14:29:28 x4 kernel: ffffffff8106fb98 ffff88021687d9d0 ffffffff814ae8b5 00000000000112c0
Dec 18 14:29:28 x4 kernel: ffff8802168b3fd8 00000000000112c0 ffff8802168b3fd8 0000000000000001
Dec 18 14:29:28 x4 kernel: ffff88021687d8d8 ffff88021687d720 ffff880216878000 ffff88021687d720
Dec 18 14:29:28 x4 kernel: Call Trace:
Dec 18 14:29:28 x4 kernel: [<ffffffff8106fb98>] ? wq_worker_sleeping+0x8/0xb0
Dec 18 14:29:28 x4 kernel: [<ffffffff814ae8b5>] ? __schedule+0x3a5/0x5f0
Dec 18 14:29:28 x4 kernel: [<ffffffff8105dbba>] ? do_exit+0x52a/0x830
Dec 18 14:29:28 x4 kernel: [<ffffffff8103785e>] ? oops_end+0x8e/0xd0
Dec 18 14:29:28 x4 kernel: [<ffffffff814a94c8>] ? no_context+0x251/0x25d
Dec 18 14:29:28 x4 kernel: [<ffffffff810512ce>] ? __do_page_fault+0x2ee/0x490
Dec 18 14:29:28 x4 kernel: [<ffffffff81083e18>] ? find_busiest_group+0x28/0x480
Dec 18 14:29:28 x4 kernel: [<ffffffff814b00af>] ? page_fault+0x1f/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff81296448>] ? radeon_vm_bo_invalidate+0x18/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff8125d2e9>] ? ttm_bo_cleanup_memtype_use+0x19/0x90
Dec 18 14:29:28 x4 kernel: [<ffffffff8125e8cb>] ? ttm_bo_cleanup_refs_and_unlock+0x12b/0x2c0
Dec 18 14:29:28 x4 kernel: [<ffffffff810de82f>] ? kfree+0xf/0xb0
Dec 18 14:29:28 x4 kernel: [<ffffffff8125eb7d>] ? ttm_bo_delayed_delete+0x11d/0x1a0
Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec12>] ? ttm_bo_delayed_workqueue+0x12/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff8106e5f9>] ? process_one_work+0x179/0x480
Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec00>] ? ttm_bo_delayed_delete+0x1a0/0x1a0
Dec 18 14:29:28 x4 kernel: [<ffffffff8106f5b1>] ? worker_thread+0x1b1/0x540
Dec 18 14:29:28 x4 kernel: [<ffffffff8106f400>] ? busy_worker_rebind_fn+0x100/0x100
Dec 18 14:29:28 x4 kernel: [<ffffffff810741cf>] ? kthread+0xaf/0xc0
Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
Dec 18 14:29:28 x4 kernel: [<ffffffff814b052c>] ? ret_from_fork+0x7c/0xb0
Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
Dec 18 14:29:28 x4 kernel: Code: 74 03 c6 03 00 65 48 8b 04 25 c0 b9 00 00 48 8b 80 48 02 00 00 5b 48 8b 40 c8 48 d1 e8 83 e0 01 c3 0f 1f 00 48 8b 87 48 02 00 00 <48> 8b 40 d8 c3 0f 1f 40 00 65 48 8b 04 25 c0 b9 00 00 48 8b b8
Dec 18 14:29:28 x4 kernel: RIP [<ffffffff81074257>] kthread_data+0x7/0x10
Dec 18 14:29:28 x4 kernel: RSP <ffff8802168b3aa0>
Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8
Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70e ]---
Dec 18 14:29:28 x4 kernel: Fixing recursive fault but reboot is needed!
Dec 18 14:29:28 x4 kernel: SysRq : Emergency Sync
--
Markus
More information about the dri-devel
mailing list