[Bug 78221] 3.16 RC1: AMD R9 270 GPU locks up on some heavy 2D activity - GPU VM fault occurs. (possibly DMA copying issue strikes back?)

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Mon Sep 8 20:09:08 PDT 2014


https://bugzilla.kernel.org/show_bug.cgi?id=78221

--- Comment #22 from t3st3r at mail.ru ---
Attempted to test on 3.17-rc4. Result: crashed in about 3 minutes of run (see
below).

Are some stability fixes missing 3.17-rc4 mainline? At first glance I do not
see radeon-related commits in drm-fixes which haven't made it to -rc4. Am I
missing something?

===cut===
 kernel: [  599.949295] radeon 0000:01:00.0: ring 3 stalled for more than
10167msec
 kernel: [  599.949305] radeon 0000:01:00.0: GPU lockup (waiting for
0x0000000000001eb0 last fence id 0x0000000000001eaf on ring 3)
 kernel: [  599.949312] radeon 0000:01:00.0: scheduling IB failed (-35).
 kernel: [  600.507409] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a840 flags=0x0010]
 kernel: [  600.507420] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a870 flags=0x0030]
 kernel: [  600.507426] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080000100 flags=0x0030]
 kernel: [  600.507431] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a700 flags=0x0010]
 kernel: [  600.507460] radeon 0000:01:00.0: Saved 19308 dwords of commands on
ring 0.
 kernel: [  600.507590] radeon 0000:01:00.0: GPU softreset: 0x0000006C
 kernel: [  600.507593] radeon 0000:01:00.0:   GRBM_STATUS               =
0xA0003028
 kernel: [  600.507596] radeon 0000:01:00.0:   GRBM_STATUS_SE0           =
0x00000006
 kernel: [  600.507598] radeon 0000:01:00.0:   GRBM_STATUS_SE1           =
0x00000006
 kernel: [  600.507600] radeon 0000:01:00.0:   SRBM_STATUS               =
0x200000C0
 kernel: [  600.507711] radeon 0000:01:00.0:   SRBM_STATUS2              =
0x00000000
 kernel: [  600.507714] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 =
0x00000000
 kernel: [  600.507716] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 =
0x00010000
 kernel: [  600.507718] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     =
0x00000002
 kernel: [  600.507720] radeon 0000:01:00.0:   R_008680_CP_STAT          =
0x80010243
 kernel: [  600.507723] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   =
0x44483106
 kernel: [  600.507725] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   =
0x44E84266
 kernel: [  600.507728] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: [  600.507730] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
 kernel: [  601.054357] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
 kernel: [  601.054411] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100140
 kernel: [  601.055568] radeon 0000:01:00.0:   GRBM_STATUS               =
0x00003028
 kernel: [  601.055571] radeon 0000:01:00.0:   GRBM_STATUS_SE0           =
0x00000006
 kernel: [  601.055573] radeon 0000:01:00.0:   GRBM_STATUS_SE1           =
0x00000006
 kernel: [  601.055575] radeon 0000:01:00.0:   SRBM_STATUS               =
0x20000AC0
 kernel: [  601.055686] radeon 0000:01:00.0:   SRBM_STATUS2              =
0x00000000
 kernel: [  601.055689] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 =
0x00000000
 kernel: [  601.055691] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 =
0x00000000
 kernel: [  601.055693] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     =
0x00000000
 kernel: [  601.055695] radeon 0000:01:00.0:   R_008680_CP_STAT          =
0x00000000
 kernel: [  601.055698] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  601.055700] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  601.055951] radeon 0000:01:00.0: GPU reset succeeded, trying to
resume
 kernel: [  601.083744] [drm] probing gen 2 caps for device 1002:5a16 =
31cd02/0
 kernel: [  601.083747] [drm] PCIE gen 2 link speeds already enabled
 kernel: [  601.084938] [drm] PCIE GART of 1024M enabled (table at
0x0000000000276000).
 kernel: [  601.085046] radeon 0000:01:00.0: WB enabled
 kernel: [  601.085049] radeon 0000:01:00.0: fence driver on ring 0 use gpu
addr 0x0000000080000c00 and cpu addr 0xffff880413fbec00
 kernel: [  601.085052] radeon 0000:01:00.0: fence driver on ring 1 use gpu
addr 0x0000000080000c04 and cpu addr 0xffff880413fbec04
 kernel: [  601.085054] radeon 0000:01:00.0: fence driver on ring 2 use gpu
addr 0x0000000080000c08 and cpu addr 0xffff880413fbec08
 kernel: [  601.085056] radeon 0000:01:00.0: fence driver on ring 3 use gpu
addr 0x0000000080000c0c and cpu addr 0xffff880413fbec0c
 kernel: [  601.085057] radeon 0000:01:00.0: fence driver on ring 4 use gpu
addr 0x0000000080000c10 and cpu addr 0xffff880413fbec10
 kernel: [  601.086030] radeon 0000:01:00.0: fence driver on ring 5 use gpu
addr 0x0000000000075a18 and cpu addr 0xffffc90011db5a18
 kernel: [  601.271000] [drm] ring test on 0 succeeded in 3 usecs
 kernel: [  601.271006] [drm] ring test on 1 succeeded in 1 usecs
 kernel: [  601.271011] [drm] ring test on 2 succeeded in 1 usecs
 kernel: [  601.271075] [drm] ring test on 3 succeeded in 2 usecs
 kernel: [  601.271084] [drm] ring test on 4 succeeded in 1 usecs
 kernel: [  601.448164] [drm] ring test on 5 succeeded in 2 usecs
 kernel: [  601.448172] [drm] UVD initialized successfully.
 kernel: [  611.444226] radeon 0000:01:00.0: ring 0 stalled for more than
10000msec
 kernel: [  611.444237] radeon 0000:01:00.0: GPU lockup (waiting for
0x000000000001a60a last fence id 0x000000000001a4dd on ring 0)
 kernel: [  611.444244] [drm:r600_ib_test] *ERROR* radeon: fence wait failed
(-35).
 kernel: [  611.444252] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed
testing IB on GFX ring (-35).
 kernel: [  611.444257] radeon 0000:01:00.0: ib ring test failed (-35).
 kernel: [  611.997330] radeon 0000:01:00.0: GPU softreset: 0x00000048
 kernel: [  611.997333] radeon 0000:01:00.0:   GRBM_STATUS               =
0xA0003028
 kernel: [  611.997336] radeon 0000:01:00.0:   GRBM_STATUS_SE0           =
0x00000006
 kernel: [  611.997338] radeon 0000:01:00.0:   GRBM_STATUS_SE1           =
0x00000006
 kernel: [  611.997341] radeon 0000:01:00.0:   SRBM_STATUS               =
0x200000C0
 kernel: [  611.997452] radeon 0000:01:00.0:   SRBM_STATUS2              =
0x00000000
 kernel: [  611.997454] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 =
0x00000000
 kernel: [  611.997456] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 =
0x00010000
 kernel: [  611.997458] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     =
0x00400002
 kernel: [  611.997461] radeon 0000:01:00.0:   R_008680_CP_STAT          =
0x84010243
 kernel: [  611.997463] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  611.997465] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  611.997468] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: [  611.997470] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
 kernel: [  612.542126] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
 kernel: [  612.542180] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
 kernel: [  612.543338] radeon 0000:01:00.0:   GRBM_STATUS               =
0x00003028
 kernel: [  612.543340] radeon 0000:01:00.0:   GRBM_STATUS_SE0           =
0x00000006
 kernel: [  612.543343] radeon 0000:01:00.0:   GRBM_STATUS_SE1           =
0x00000006
 kernel: [  612.543345] radeon 0000:01:00.0:   SRBM_STATUS               =
0x200000C0
 kernel: [  612.543456] radeon 0000:01:00.0:   SRBM_STATUS2              =
0x00000000
 kernel: [  612.543458] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 =
0x00000000
 kernel: [  612.543460] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 =
0x00000000
 kernel: [  612.543462] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     =
0x00000000
 kernel: [  612.543465] radeon 0000:01:00.0:   R_008680_CP_STAT          =
0x00000000
 kernel: [  612.543467] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  612.543469] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   =
0x44C83D57
 kernel: [  612.543724] radeon 0000:01:00.0: GPU reset succeeded, trying to
resume
 kernel: [  612.556911] [drm] probing gen 2 caps for device 1002:5a16 =
31cd02/0
 kernel: [  612.556915] [drm] PCIE gen 2 link speeds already enabled
 kernel: [  612.558107] [drm] PCIE GART of 1024M enabled (table at
0x0000000000276000).
 kernel: [  612.558216] radeon 0000:01:00.0: WB enabled
 kernel: [  612.558219] radeon 0000:01:00.0: fence driver on ring 0 use gpu
addr 0x0000000080000c00 and cpu addr 0xffff880413fbec00
 kernel: [  612.558222] radeon 0000:01:00.0: fence driver on ring 1 use gpu
addr 0x0000000080000c04 and cpu addr 0xffff880413fbec04
 kernel: [  612.558224] radeon 0000:01:00.0: fence driver on ring 2 use gpu
addr 0x0000000080000c08 and cpu addr 0xffff880413fbec08
 kernel: [  612.558226] radeon 0000:01:00.0: fence driver on ring 3 use gpu
addr 0x0000000080000c0c and cpu addr 0xffff880413fbec0c
 kernel: [  612.558228] radeon 0000:01:00.0: fence driver on ring 4 use gpu
addr 0x0000000080000c10 and cpu addr 0xffff880413fbec10
 kernel: [  612.559203] radeon 0000:01:00.0: fence driver on ring 5 use gpu
addr 0x0000000000075a18 and cpu addr 0xffffc90011db5a18
 kernel: [  612.744297] [drm] ring test on 0 succeeded in 3 usecs
 kernel: [  612.744302] [drm] ring test on 1 succeeded in 1 usecs
 kernel: [  612.744308] [drm] ring test on 2 succeeded in 1 usecs
 kernel: [  612.744371] [drm] ring test on 3 succeeded in 2 usecs
 kernel: [  612.744380] [drm] ring test on 4 succeeded in 1 usecs
 kernel: [  612.921464] [drm] ring test on 5 succeeded in 2 usecs
 kernel: [  612.921472] [drm] UVD initialized successfully.
 kernel: [  612.921539] [drm] ib test on ring 0 succeeded in 0 usecs
 kernel: [  612.921634] [drm] ib test on ring 1 succeeded in 0 usecs
 kernel: [  612.921722] [drm] ib test on ring 2 succeeded in 0 usecs
 kernel: [  612.921762] [drm] ib test on ring 3 succeeded in 0 usecs
 kernel: [  612.921796] [drm] ib test on ring 4 succeeded in 0 usecs
 kernel: [  623.068910] radeon 0000:01:00.0: ring 5 stalled for more than
10000msec
 kernel: [  623.068921] radeon 0000:01:00.0: GPU lockup (waiting for
0x0000000000000004 last fence id 0x0000000000000002 on ring 5)
 kernel: [  623.068927] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait
failed (-35).
 kernel: [  623.068935] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed
testing IB on ring 5 (-35).
 kernel: [  623.098333] radeon 0000:01:00.0: GPU fault detected: 146 0x07a23d0c
 kernel: [  623.098342] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000BDBD
 kernel: [  623.098347] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0203D00C
 kernel: [  623.098352] VM fault (0x0c, vmid 1) at page 48573, read from DMA1
(61)
 kernel: [  623.098364] radeon 0000:01:00.0: GPU fault detected: 146 0x07c23d0c
 kernel: [  623.098368] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: [  623.098372] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0208400C
 kernel: [  623.098377] VM fault (0x0c, vmid 1) at page 0, read from TC (132)
 kernel: [  623.098383] radeon 0000:01:00.0: GPU fault detected: 146 0x07e23d0c
 kernel: [  623.098387] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000BDBC
 kernel: [  623.098391] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0200800C
 kernel: [  623.098395] VM fault (0x0c, vmid 1) at page 48572, read from TC (8)
 kernel: [  623.128770] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
 kernel: [  623.128781] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000BDB0
 kernel: [  623.128787] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0303D014
 kernel: [  623.128793] VM fault (0x04, vmid 1) at page 48560, write from DMA1
(61)
 kernel: [  623.128820] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
 kernel: [  623.128825] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: [  623.128830] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0204400C
 kernel: [  623.128835] VM fault (0x0c, vmid 1) at page 0, read from TC (68)
 kernel: [  623.128842] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
 kernel: [  623.128847] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000BDB8
 kernel: [  623.128852] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0204400C
 kernel: [  623.128857] VM fault (0x0c, vmid 1) at page 48568, read from TC
(68)
 kernel: [  623.129932] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
 kernel: [  623.129940] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000BDB0
 kernel: [  623.129944] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0303D014
 kernel: [  623.129948] VM fault (0x04, vmid 1) at page 48560, write from DMA1
(61)
 kernel: [  623.129965] radeon 0000:01:00.0: GPU fault detected: 146 0x06233d14
===cut===
Note: several megabytes of similar "VM fault" flood skipped.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list