GPU lockup CP stall for more than 10000msec on latest vanilla git

Markus Trippelsdorf markus at trippelsdorf.de
Tue Dec 18 05:51:02 PST 2012


On 2012.12.18 at 14:38 +0100, Markus Trippelsdorf wrote:
> On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
> > On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
> > > On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> > > > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> > > > > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> > > > > <markus at trippelsdorf.de> wrote:
> > > > > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > > > > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> > > > > >> <markus at trippelsdorf.de> wrote:
> > > > > >> > As soon as I open the following website:
> > > > > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > > > > >> >
> > > > > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > > > > >>
> > > > > >> Is this a regression?  Most likely a 3D driver bug unless you are only
> > > > > >> seeing it with specific kernels.  What browser are you using and do
> > > > > >> you have hw accelerated webgl, etc. enabled?  If so, what version of
> > > > > >> mesa are you using?
> > > > > >
> > > > > > This is a regression, because it is caused by yesterdays merge of
> > > > > > drm-next by Linus. IOW I only see this bug when running a
> > > > > > v3.7-9432-g9360b53 kernel.
> > > > > 
> > > > > Can you bisect?  I'm guessing it may be related to the new DMA rings.  Possibly:
> > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> > > > 
> > > > Yes, the commit above causes the issue. 
> > > > 
> > > >  2d6cc72  GPU lockups
> > > 
> > > With 2d6cc72 reverted I get:
> > > 
> > > Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------
> > 
> > Probably a separate issue, can you bisect this one as well?
> 
> Yes. Git-bisect points to:
> 
> 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
> commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
> Author: Maarten Lankhorst <maarten.lankhorst at canonical.com>
> Date:   Thu Nov 29 11:36:54 2012 +0000
> 
>     drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
>     held, v3
> 
> (Please note that this bug is a little bit harder to reproduce. But
> when you scroll up and down for ~10 seconds on the webpage mentioned
> above it will trigger the oops.
> So while I'm not 100% sure that the issue is caused by exactly this
> commit, the vicinity should be right)
> 
> Dec 18 14:29:07 x4 kernel: ------------[ cut here ]------------
> Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 radeon_fence_ref+0x2c/0x40()
> Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
> Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 3.7.0-rc7-00520-g85b144f #168
> Dec 18 14:29:07 x4 kernel: Call Trace:
> Dec 18 14:29:07 x4 kernel: [<ffffffff81058c84>] ? warn_slowpath_common+0x74/0xb0
> Dec 18 14:29:07 x4 kernel: [<ffffffff812926fc>] ? radeon_fence_ref+0x2c/0x40
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125e91c>] ? ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125f13c>] ? ttm_mem_evict_first+0x1dc/0x2a0
> Dec 18 14:29:07 x4 kernel: [<ffffffff81264412>] ? ttm_bo_man_get_node+0x62/0xb0
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125f48e>] ? ttm_bo_mem_space+0x28e/0x340
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125facc>] ? ttm_bo_move_buffer+0xfc/0x170
> Dec 18 14:29:07 x4 kernel: [<ffffffff810de172>] ? kmem_cache_alloc+0xb2/0xc0
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125fbd5>] ? ttm_bo_validate+0x95/0x110
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125ff3c>] ? ttm_bo_init+0x2ec/0x3b0
> Dec 18 14:29:07 x4 kernel: [<ffffffff8129415a>] ? radeon_bo_create+0x18a/0x200
> Dec 18 14:29:07 x4 kernel: [<ffffffff81293e40>] ? radeon_bo_clear_va+0x40/0x40
> Dec 18 14:29:07 x4 kernel: [<ffffffff812a5302>] ? radeon_gem_object_create+0x92/0x160
> Dec 18 14:29:07 x4 kernel: [<ffffffff812a571c>] ? radeon_gem_create_ioctl+0x6c/0x150
> Dec 18 14:29:07 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
> Dec 18 14:29:07 x4 kernel: [<ffffffff812a56b0>] ? radeon_gem_pwrite_ioctl+0x20/0x20
> Dec 18 14:29:07 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
> Dec 18 14:29:07 x4 kernel: [<ffffffff810e5588>] ? vfs_read+0x118/0x160
> Dec 18 14:29:07 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
> Dec 18 14:29:07 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
> Dec 18 14:29:07 x4 kernel: [<ffffffff814b05d2>] ? system_call_fastpath+0x16/0x1b
> Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
> Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at 0000000100000077
> Dec 18 14:29:07 x4 kernel: IP: [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
> Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0
> Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP
> Dec 18 14:29:07 x4 kernel: CPU 1
> Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: G        W    3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
> Dec 18 14:29:07 x4 kernel: RIP: 0010:[<ffffffff814afa15>]  [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
> Dec 18 14:29:07 x4 kernel: RSP: 0018:ffff880211645d58  EFLAGS: 00010286
> Dec 18 14:29:07 x4 kernel: RAX: 0000000000000100 RBX: ffff8801c0e29448 RCX: 0000000000000000
> Dec 18 14:29:07 x4 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000100000077
> Dec 18 14:29:07 x4 kernel: RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffffff81838370
> Dec 18 14:29:07 x4 kernel: R10: ffffffff812a5960 R11: 0000000000000246 R12: 0000000000000001
> Dec 18 14:29:07 x4 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 00007fff0723dba0
> Dec 18 14:29:07 x4 kernel: FS:  00007f958542f880(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
> Dec 18 14:29:07 x4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077 CR3: 000000021161a000 CR4: 00000000000007e0
> Dec 18 14:29:07 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 18 14:29:07 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 18 14:29:07 x4 kernel: Process X (pid: 161, threadinfo ffff880211644000, task ffff880215ab85d0)
> Dec 18 14:29:07 x4 kernel: Stack:
> Dec 18 14:29:07 x4 kernel: ffffffff8125d9ba 0000000015c83600 ffff8801c0e29400 ffff880211645e30
> Dec 18 14:29:07 x4 kernel: ffff8801c0e29448 ffff880211645dcc 0000000000000001 ffffffff81294bff
> Dec 18 14:29:07 x4 kernel: ffff8801c0e29608 ffff880211645e30 ffff880216a76000 ffff880211645e30
> Dec 18 14:29:07 x4 kernel: Call Trace:
> Dec 18 14:29:07 x4 kernel: [<ffffffff8125d9ba>] ? ttm_bo_reserve+0x3a/0x110
> Dec 18 14:29:07 x4 kernel: [<ffffffff81294bff>] ? radeon_bo_wait+0x3f/0xc0
> Dec 18 14:29:07 x4 kernel: [<ffffffff812a59b7>] ? radeon_gem_busy_ioctl+0x57/0x100
> Dec 18 14:29:07 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
> Dec 18 14:29:07 x4 kernel: [<ffffffff812a5960>] ? radeon_gem_mmap_ioctl+0x20/0x20
> Dec 18 14:29:07 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
> Dec 18 14:29:07 x4 kernel: [<ffffffff810e55ad>] ? vfs_read+0x13d/0x160
> Dec 18 14:29:07 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
> Dec 18 14:29:07 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
> Dec 18 14:29:07 x4 kernel: [<ffffffff814b05d2>] ? system_call_fastpath+0x16/0x1b
> Dec 18 14:29:07 x4 kernel: Code: 31 c0 5b c3 66 90 8d 8a 00 01 00 00 89 d0 f0 66 0f b1 0b 66 39 d0 75 de b8 01 00 00 00 5b c3 0f 1f 80 00 00 00 00 b8 00 01 00 00 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 10 0f 1f 80 00 00 00 00 f3 90
> Dec 18 14:29:07 x4 kernel: RIP  [<ffffffff814afa15>] _raw_spin_lock+0x5/0x30
> Dec 18 14:29:07 x4 kernel: RSP <ffff880211645d58>
> Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077
> Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70c ]---
> Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at 0000000100000023
> Dec 18 14:29:28 x4 kernel: IP: [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
> Dec 18 14:29:28 x4 kernel: PGD 205289067 PUD 0
> Dec 18 14:29:28 x4 kernel: Oops: 0002 [#2] SMP
> Dec 18 14:29:28 x4 kernel: CPU 1
> Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G      D W    3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
> Dec 18 14:29:28 x4 kernel: RIP: 0010:[<ffffffff81296448>]  [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
> Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3d78  EFLAGS: 00010207
> Dec 18 14:29:28 x4 kernel: RAX: 00000000ffffffff RBX: ffff8801c0e29048 RCX: ffff8801c0e2b928
> Dec 18 14:29:28 x4 kernel: RDX: 0000000000000001 RSI: ffff8801c0e291f0 RDI: 00000000ffffffff
> Dec 18 14:29:28 x4 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> Dec 18 14:29:28 x4 kernel: R10: ffffea0007038a00 R11: dead000000100100 R12: ffff880216a76590
> Dec 18 14:29:28 x4 kernel: R13: ffffffff818383e0 R14: 0000000000000000 R15: ffff880215c83678
> Dec 18 14:29:28 x4 kernel: FS:  00007f4bb2b64740(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
> Dec 18 14:29:28 x4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023 CR3: 000000020698f000 CR4: 00000000000007e0
> Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8802168b2000, task ffff88021687d730)
> Dec 18 14:29:28 x4 kernel: Stack:
> Dec 18 14:29:28 x4 kernel: ffffffff8125d2e9 ffff8801c0e29048 ffffffff8125e8cb ffff880216a769b8
> Dec 18 14:29:28 x4 kernel: ffffffff810de82f ffff8801c0e2b848 ffff880215c83678 ffff8801c0e2b900
> Dec 18 14:29:28 x4 kernel: 0000000000000001 ffff880216a76a80 ffff8801c0e29048 ffffffff8125eb7d
> Dec 18 14:29:28 x4 kernel: Call Trace:
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125d2e9>] ? ttm_bo_cleanup_memtype_use+0x19/0x90
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125e8cb>] ? ttm_bo_cleanup_refs_and_unlock+0x12b/0x2c0
> Dec 18 14:29:28 x4 kernel: [<ffffffff810de82f>] ? kfree+0xf/0xb0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125eb7d>] ? ttm_bo_delayed_delete+0x11d/0x1a0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec12>] ? ttm_bo_delayed_workqueue+0x12/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106e5f9>] ? process_one_work+0x179/0x480
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec00>] ? ttm_bo_delayed_delete+0x1a0/0x1a0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106f5b1>] ? worker_thread+0x1b1/0x540
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106f400>] ? busy_worker_rebind_fn+0x100/0x100
> Dec 18 14:29:28 x4 kernel: [<ffffffff810741cf>] ? kthread+0xaf/0xc0
> Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff814b052c>] ? ret_from_fork+0x7c/0xb0
> Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
> Dec 18 14:29:28 x4 kernel: Code: 8b 44 24 04 48 83 c4 08 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 86 f0 01 00 00 48 81 c6 f0 01 00 00 48 39 f0 74 11 0f 1f 44 00 00 <c6> 40 24 00 48 8b 00 48 39 f0 75 f4 f3 c3 66 2e 0f 1f 84 00 00
> Dec 18 14:29:28 x4 kernel: RIP  [<ffffffff81296448>] radeon_vm_bo_invalidate+0x18/0x30
> Dec 18 14:29:28 x4 kernel: RSP <ffff8802168b3d78>
> Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023
> Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70d ]---
> Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
> Dec 18 14:29:28 x4 kernel: IP: [<ffffffff81074257>] kthread_data+0x7/0x10
> Dec 18 14:29:28 x4 kernel: PGD 180d067 PUD 180e067 PMD 0
> Dec 18 14:29:28 x4 kernel: Oops: 0000 [#3] SMP
> Dec 18 14:29:28 x4 kernel: CPU 1
> Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G      D W    3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
> Dec 18 14:29:28 x4 kernel: RIP: 0010:[<ffffffff81074257>]  [<ffffffff81074257>] kthread_data+0x7/0x10
> Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3aa0  EFLAGS: 00010002
> Dec 18 14:29:28 x4 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000015c7992d1
> Dec 18 14:29:28 x4 kernel: RDX: ffffffffff8a8b63 RSI: 0000000000000001 RDI: ffff88021687d730
> Dec 18 14:29:28 x4 kernel: RBP: ffff88021687d730 R08: 0000000000000000 R09: 0000000000000000
> Dec 18 14:29:28 x4 kernel: R10: ffff880216887980 R11: 0000000000000000 R12: ffff88021fc912c0
> Dec 18 14:29:28 x4 kernel: R13: 0000000000000001 R14: ffff88021687d720 R15: ffff88021687d730
> Dec 18 14:29:28 x4 kernel: FS:  00007f4bb2b64740(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
> Dec 18 14:29:28 x4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8 CR3: 000000020698f000 CR4: 00000000000007e0
> Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8802168b2000, task ffff88021687d730)
> Dec 18 14:29:28 x4 kernel: Stack:
> Dec 18 14:29:28 x4 kernel: ffffffff8106fb98 ffff88021687d9d0 ffffffff814ae8b5 00000000000112c0
> Dec 18 14:29:28 x4 kernel: ffff8802168b3fd8 00000000000112c0 ffff8802168b3fd8 0000000000000001
> Dec 18 14:29:28 x4 kernel: ffff88021687d8d8 ffff88021687d720 ffff880216878000 ffff88021687d720
> Dec 18 14:29:28 x4 kernel: Call Trace:
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106fb98>] ? wq_worker_sleeping+0x8/0xb0
> Dec 18 14:29:28 x4 kernel: [<ffffffff814ae8b5>] ? __schedule+0x3a5/0x5f0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8105dbba>] ? do_exit+0x52a/0x830
> Dec 18 14:29:28 x4 kernel: [<ffffffff8103785e>] ? oops_end+0x8e/0xd0
> Dec 18 14:29:28 x4 kernel: [<ffffffff814a94c8>] ? no_context+0x251/0x25d
> Dec 18 14:29:28 x4 kernel: [<ffffffff810512ce>] ? __do_page_fault+0x2ee/0x490
> Dec 18 14:29:28 x4 kernel: [<ffffffff81083e18>] ? find_busiest_group+0x28/0x480
> Dec 18 14:29:28 x4 kernel: [<ffffffff814b00af>] ? page_fault+0x1f/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff81296448>] ? radeon_vm_bo_invalidate+0x18/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125d2e9>] ? ttm_bo_cleanup_memtype_use+0x19/0x90
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125e8cb>] ? ttm_bo_cleanup_refs_and_unlock+0x12b/0x2c0
> Dec 18 14:29:28 x4 kernel: [<ffffffff810de82f>] ? kfree+0xf/0xb0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125eb7d>] ? ttm_bo_delayed_delete+0x11d/0x1a0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec12>] ? ttm_bo_delayed_workqueue+0x12/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106e5f9>] ? process_one_work+0x179/0x480
> Dec 18 14:29:28 x4 kernel: [<ffffffff8125ec00>] ? ttm_bo_delayed_delete+0x1a0/0x1a0
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106f5b1>] ? worker_thread+0x1b1/0x540
> Dec 18 14:29:28 x4 kernel: [<ffffffff8106f400>] ? busy_worker_rebind_fn+0x100/0x100
> Dec 18 14:29:28 x4 kernel: [<ffffffff810741cf>] ? kthread+0xaf/0xc0
> Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
> Dec 18 14:29:28 x4 kernel: [<ffffffff814b052c>] ? ret_from_fork+0x7c/0xb0
> Dec 18 14:29:28 x4 kernel: [<ffffffff81074120>] ? __kthread_bind+0x30/0x30
> Dec 18 14:29:28 x4 kernel: Code: 74 03 c6 03 00 65 48 8b 04 25 c0 b9 00 00 48 8b 80 48 02 00 00 5b 48 8b 40 c8 48 d1 e8 83 e0 01 c3 0f 1f 00 48 8b 87 48 02 00 00 <48> 8b 40 d8 c3 0f 1f 40 00 65 48 8b 04 25 c0 b9 00 00 48 8b b8
> Dec 18 14:29:28 x4 kernel: RIP  [<ffffffff81074257>] kthread_data+0x7/0x10
> Dec 18 14:29:28 x4 kernel: RSP <ffff8802168b3aa0>
> Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8
> Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70e ]---
> Dec 18 14:29:28 x4 kernel: Fixing recursive fault but reboot is needed!
> Dec 18 14:29:28 x4 kernel: SysRq : Emergency Sync

CCing Maarten

-- 
Markus


More information about the dri-devel mailing list