[Bug 42678] [3.3-rc1] radeon stuck in kernel after lockup

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Sat Feb 4 00:39:45 PST 2012


https://bugzilla.kernel.org/show_bug.cgi?id=42678





--- Comment #3 from Torsten Kaiser <just.for.lkml at googlemail.com>  2012-02-04 08:39:42 ---
The fix for the lockup itself in now in mainline and should be released in
3.3-rc3.

But I can confirm that the regression (that X is no longer recovering from the
GPU lockup / GPU reset) is still there in 3.3-rc2.

For my log, first the lockup:
Feb  4 08:55:25 thoregon kernel: [15457.570126] radeon 0000:07:00.0: GPU lockup
CP stall for more than 10000msec
Feb  4 08:55:25 thoregon kernel: [15457.570134] GPU lockup (waiting for
0x00070CAA last fence id 0x00070CA9)
Feb  4 08:55:25 thoregon kernel: [15457.586330] radeon 0000:07:00.0: GPU
softreset 
Feb  4 08:55:25 thoregon kernel: [15457.586337] radeon 0000:07:00.0:  
R_008010_GRBM_STATUS=0xA0003028
Feb  4 08:55:25 thoregon kernel: [15457.586343] radeon 0000:07:00.0:  
R_008014_GRBM_STATUS2=0x00000002
Feb  4 08:55:25 thoregon kernel: [15457.586349] radeon 0000:07:00.0:  
R_000E50_SRBM_STATUS=0x200000C0
Feb  4 08:55:25 thoregon kernel: [15457.586362] radeon 0000:07:00.0:  
R_008020_GRBM_SOFT_RESET=0x00007FEE
Feb  4 08:55:25 thoregon kernel: [15457.601387] radeon 0000:07:00.0:
R_008020_GRBM_SOFT_RESET=0x00000001
Feb  4 08:55:25 thoregon kernel: [15457.617378] radeon 0000:07:00.0:  
R_008010_GRBM_STATUS=0x00003028
Feb  4 08:55:25 thoregon kernel: [15457.617384] radeon 0000:07:00.0:  
R_008014_GRBM_STATUS2=0x00000002
Feb  4 08:55:25 thoregon kernel: [15457.617390] radeon 0000:07:00.0:  
R_000E50_SRBM_STATUS=0x200000C0
Feb  4 08:55:25 thoregon kernel: [15457.618393] radeon 0000:07:00.0: GPU reset
succeed
Feb  4 08:55:25 thoregon kernel: [15457.623326] [drm] PCIE GART of 512M enabled
(table at 0x0000000000040000).
Feb  4 08:55:25 thoregon kernel: [15457.623361] radeon 0000:07:00.0: WB enabled
Feb  4 08:55:25 thoregon kernel: [15457.623367] [drm] fence driver on ring 0
use gpu addr 0x20000c00 and cpu addr 0xffff880328696c00
Feb  4 08:55:25 thoregon kernel: [15457.669623] [drm] ring test on 0 succeeded
in 1 usecs
Feb  4 08:55:25 thoregon kernel: [15457.669648] [drm] ib test on ring 0
succeeded in 1 usecs

Then, when the X server tries to unblank the screens it gets stuck. There no
longer is a mutex deadlock for the hung task detector to log, but SysRq+W shows
X in D state:
 Feb  4 09:28:30 thoregon kernel: [17441.917129] SysRq : Changing Loglevel
Feb  4 09:28:30 thoregon kernel: [17441.917140] Loglevel set to 6
Feb  4 09:28:31 thoregon kernel: [17443.659030] SysRq : Show Blocked State
Feb  4 09:28:31 thoregon kernel: [17443.659040]   task                       
PC stack   pid father
Feb  4 09:28:31 thoregon kernel: [17443.659122] X               D
ffff880337d50a00     0  3048   3027 0x00400004
Feb  4 09:28:31 thoregon kernel: [17443.659133]  ffff880328709700
0000000000000082 ffff8802f2dc5c00 0000000000010a00
Feb  4 09:28:31 thoregon kernel: [17443.659143]  ffff88031bf2bfd8
0000000000010a00 ffff88031bf2a000 ffff88031bf2bfd8
Feb  4 09:28:31 thoregon kernel: [17443.659152]  0000000000010a00
ffff880328709700 0000000000010a00 0000000000010a00
Feb  4 09:28:31 thoregon kernel: [17443.659161] Call Trace:
Feb  4 09:28:31 thoregon kernel: [17443.659177]  [<ffffffff815ee9d7>] ?
schedule_timeout+0x157/0x220
Feb  4 09:28:31 thoregon kernel: [17443.659188]  [<ffffffff8103fcb0>] ?
run_timer_softirq+0x240/0x240
Feb  4 09:28:31 thoregon kernel: [17443.659197]  [<ffffffff8133ee39>] ?
radeon_fence_wait+0x239/0x3b0
Feb  4 09:28:31 thoregon kernel: [17443.659207]  [<ffffffff8104f420>] ?
wake_up_bit+0x40/0x40
Feb  4 09:28:31 thoregon kernel: [17443.659215]  [<ffffffff81352f77>] ?
radeon_ib_get+0x257/0x2e0
Feb  4 09:28:31 thoregon kernel: [17443.659224]  [<ffffffff81354f4a>] ?
radeon_cs_ioctl+0x27a/0x4d0
Feb  4 09:28:31 thoregon kernel: [17443.659232]  [<ffffffff812f4184>] ?
drm_ioctl+0x3e4/0x490
Feb  4 09:28:31 thoregon kernel: [17443.659240]  [<ffffffff81354cd0>] ?
radeon_cs_finish_pages+0xa0/0xa0
Feb  4 09:28:31 thoregon kernel: [17443.659249]  [<ffffffff810247e9>] ?
do_page_fault+0x199/0x420
Feb  4 09:28:31 thoregon kernel: [17443.659257]  [<ffffffff810af4dc>] ?
mmap_region+0x1dc/0x570
Feb  4 09:28:31 thoregon kernel: [17443.659265]  [<ffffffff810de636>] ?
do_vfs_ioctl+0x96/0x4e0
Feb  4 09:28:31 thoregon kernel: [17443.659273]  [<ffffffff810deac9>] ?
sys_ioctl+0x49/0x90
Feb  4 09:28:31 thoregon kernel: [17443.659281]  [<ffffffff815f18e2>] ?
system_call_fastpath+0x16/0x1b
Feb  4 09:28:41 thoregon kernel: [17453.327296] SysRq : Emergency Sync
Feb  4 09:28:41 thoregon kernel: [17453.327912] Emergency Sync complete

Apart from the X server the system was still working. I was able to ssh into it
and do a normal shutdown.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


More information about the dri-devel mailing list