[Bug 78221] 3.16 RC1: AMD R9 270 GPU locks up on some heavy 2D activity - GPU VM fault occurs. (possibly DMA copying issue strikes back?)
bugzilla-daemon at bugzilla.kernel.org
bugzilla-daemon at bugzilla.kernel.org
Mon Sep 8 20:09:08 PDT 2014
https://bugzilla.kernel.org/show_bug.cgi?id=78221
--- Comment #22 from t3st3r at mail.ru ---
Attempted to test on 3.17-rc4. Result: crashed in about 3 minutes of run (see
below).
Are some stability fixes missing 3.17-rc4 mainline? At first glance I do not
see radeon-related commits in drm-fixes which haven't made it to -rc4. Am I
missing something?
===cut===
kernel: [ 599.949295] radeon 0000:01:00.0: ring 3 stalled for more than
10167msec
kernel: [ 599.949305] radeon 0000:01:00.0: GPU lockup (waiting for
0x0000000000001eb0 last fence id 0x0000000000001eaf on ring 3)
kernel: [ 599.949312] radeon 0000:01:00.0: scheduling IB failed (-35).
kernel: [ 600.507409] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a840 flags=0x0010]
kernel: [ 600.507420] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a870 flags=0x0030]
kernel: [ 600.507426] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080000100 flags=0x0030]
kernel: [ 600.507431] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000000008040a700 flags=0x0010]
kernel: [ 600.507460] radeon 0000:01:00.0: Saved 19308 dwords of commands on
ring 0.
kernel: [ 600.507590] radeon 0000:01:00.0: GPU softreset: 0x0000006C
kernel: [ 600.507593] radeon 0000:01:00.0: GRBM_STATUS =
0xA0003028
kernel: [ 600.507596] radeon 0000:01:00.0: GRBM_STATUS_SE0 =
0x00000006
kernel: [ 600.507598] radeon 0000:01:00.0: GRBM_STATUS_SE1 =
0x00000006
kernel: [ 600.507600] radeon 0000:01:00.0: SRBM_STATUS =
0x200000C0
kernel: [ 600.507711] radeon 0000:01:00.0: SRBM_STATUS2 =
0x00000000
kernel: [ 600.507714] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 =
0x00000000
kernel: [ 600.507716] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 =
0x00010000
kernel: [ 600.507718] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT =
0x00000002
kernel: [ 600.507720] radeon 0000:01:00.0: R_008680_CP_STAT =
0x80010243
kernel: [ 600.507723] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG =
0x44483106
kernel: [ 600.507725] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG =
0x44E84266
kernel: [ 600.507728] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: [ 600.507730] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
kernel: [ 601.054357] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
kernel: [ 601.054411] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100140
kernel: [ 601.055568] radeon 0000:01:00.0: GRBM_STATUS =
0x00003028
kernel: [ 601.055571] radeon 0000:01:00.0: GRBM_STATUS_SE0 =
0x00000006
kernel: [ 601.055573] radeon 0000:01:00.0: GRBM_STATUS_SE1 =
0x00000006
kernel: [ 601.055575] radeon 0000:01:00.0: SRBM_STATUS =
0x20000AC0
kernel: [ 601.055686] radeon 0000:01:00.0: SRBM_STATUS2 =
0x00000000
kernel: [ 601.055689] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 =
0x00000000
kernel: [ 601.055691] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 =
0x00000000
kernel: [ 601.055693] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT =
0x00000000
kernel: [ 601.055695] radeon 0000:01:00.0: R_008680_CP_STAT =
0x00000000
kernel: [ 601.055698] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG =
0x44C83D57
kernel: [ 601.055700] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG =
0x44C83D57
kernel: [ 601.055951] radeon 0000:01:00.0: GPU reset succeeded, trying to
resume
kernel: [ 601.083744] [drm] probing gen 2 caps for device 1002:5a16 =
31cd02/0
kernel: [ 601.083747] [drm] PCIE gen 2 link speeds already enabled
kernel: [ 601.084938] [drm] PCIE GART of 1024M enabled (table at
0x0000000000276000).
kernel: [ 601.085046] radeon 0000:01:00.0: WB enabled
kernel: [ 601.085049] radeon 0000:01:00.0: fence driver on ring 0 use gpu
addr 0x0000000080000c00 and cpu addr 0xffff880413fbec00
kernel: [ 601.085052] radeon 0000:01:00.0: fence driver on ring 1 use gpu
addr 0x0000000080000c04 and cpu addr 0xffff880413fbec04
kernel: [ 601.085054] radeon 0000:01:00.0: fence driver on ring 2 use gpu
addr 0x0000000080000c08 and cpu addr 0xffff880413fbec08
kernel: [ 601.085056] radeon 0000:01:00.0: fence driver on ring 3 use gpu
addr 0x0000000080000c0c and cpu addr 0xffff880413fbec0c
kernel: [ 601.085057] radeon 0000:01:00.0: fence driver on ring 4 use gpu
addr 0x0000000080000c10 and cpu addr 0xffff880413fbec10
kernel: [ 601.086030] radeon 0000:01:00.0: fence driver on ring 5 use gpu
addr 0x0000000000075a18 and cpu addr 0xffffc90011db5a18
kernel: [ 601.271000] [drm] ring test on 0 succeeded in 3 usecs
kernel: [ 601.271006] [drm] ring test on 1 succeeded in 1 usecs
kernel: [ 601.271011] [drm] ring test on 2 succeeded in 1 usecs
kernel: [ 601.271075] [drm] ring test on 3 succeeded in 2 usecs
kernel: [ 601.271084] [drm] ring test on 4 succeeded in 1 usecs
kernel: [ 601.448164] [drm] ring test on 5 succeeded in 2 usecs
kernel: [ 601.448172] [drm] UVD initialized successfully.
kernel: [ 611.444226] radeon 0000:01:00.0: ring 0 stalled for more than
10000msec
kernel: [ 611.444237] radeon 0000:01:00.0: GPU lockup (waiting for
0x000000000001a60a last fence id 0x000000000001a4dd on ring 0)
kernel: [ 611.444244] [drm:r600_ib_test] *ERROR* radeon: fence wait failed
(-35).
kernel: [ 611.444252] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed
testing IB on GFX ring (-35).
kernel: [ 611.444257] radeon 0000:01:00.0: ib ring test failed (-35).
kernel: [ 611.997330] radeon 0000:01:00.0: GPU softreset: 0x00000048
kernel: [ 611.997333] radeon 0000:01:00.0: GRBM_STATUS =
0xA0003028
kernel: [ 611.997336] radeon 0000:01:00.0: GRBM_STATUS_SE0 =
0x00000006
kernel: [ 611.997338] radeon 0000:01:00.0: GRBM_STATUS_SE1 =
0x00000006
kernel: [ 611.997341] radeon 0000:01:00.0: SRBM_STATUS =
0x200000C0
kernel: [ 611.997452] radeon 0000:01:00.0: SRBM_STATUS2 =
0x00000000
kernel: [ 611.997454] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 =
0x00000000
kernel: [ 611.997456] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 =
0x00010000
kernel: [ 611.997458] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT =
0x00400002
kernel: [ 611.997461] radeon 0000:01:00.0: R_008680_CP_STAT =
0x84010243
kernel: [ 611.997463] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG =
0x44C83D57
kernel: [ 611.997465] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG =
0x44C83D57
kernel: [ 611.997468] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: [ 611.997470] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
kernel: [ 612.542126] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
kernel: [ 612.542180] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
kernel: [ 612.543338] radeon 0000:01:00.0: GRBM_STATUS =
0x00003028
kernel: [ 612.543340] radeon 0000:01:00.0: GRBM_STATUS_SE0 =
0x00000006
kernel: [ 612.543343] radeon 0000:01:00.0: GRBM_STATUS_SE1 =
0x00000006
kernel: [ 612.543345] radeon 0000:01:00.0: SRBM_STATUS =
0x200000C0
kernel: [ 612.543456] radeon 0000:01:00.0: SRBM_STATUS2 =
0x00000000
kernel: [ 612.543458] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 =
0x00000000
kernel: [ 612.543460] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 =
0x00000000
kernel: [ 612.543462] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT =
0x00000000
kernel: [ 612.543465] radeon 0000:01:00.0: R_008680_CP_STAT =
0x00000000
kernel: [ 612.543467] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG =
0x44C83D57
kernel: [ 612.543469] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG =
0x44C83D57
kernel: [ 612.543724] radeon 0000:01:00.0: GPU reset succeeded, trying to
resume
kernel: [ 612.556911] [drm] probing gen 2 caps for device 1002:5a16 =
31cd02/0
kernel: [ 612.556915] [drm] PCIE gen 2 link speeds already enabled
kernel: [ 612.558107] [drm] PCIE GART of 1024M enabled (table at
0x0000000000276000).
kernel: [ 612.558216] radeon 0000:01:00.0: WB enabled
kernel: [ 612.558219] radeon 0000:01:00.0: fence driver on ring 0 use gpu
addr 0x0000000080000c00 and cpu addr 0xffff880413fbec00
kernel: [ 612.558222] radeon 0000:01:00.0: fence driver on ring 1 use gpu
addr 0x0000000080000c04 and cpu addr 0xffff880413fbec04
kernel: [ 612.558224] radeon 0000:01:00.0: fence driver on ring 2 use gpu
addr 0x0000000080000c08 and cpu addr 0xffff880413fbec08
kernel: [ 612.558226] radeon 0000:01:00.0: fence driver on ring 3 use gpu
addr 0x0000000080000c0c and cpu addr 0xffff880413fbec0c
kernel: [ 612.558228] radeon 0000:01:00.0: fence driver on ring 4 use gpu
addr 0x0000000080000c10 and cpu addr 0xffff880413fbec10
kernel: [ 612.559203] radeon 0000:01:00.0: fence driver on ring 5 use gpu
addr 0x0000000000075a18 and cpu addr 0xffffc90011db5a18
kernel: [ 612.744297] [drm] ring test on 0 succeeded in 3 usecs
kernel: [ 612.744302] [drm] ring test on 1 succeeded in 1 usecs
kernel: [ 612.744308] [drm] ring test on 2 succeeded in 1 usecs
kernel: [ 612.744371] [drm] ring test on 3 succeeded in 2 usecs
kernel: [ 612.744380] [drm] ring test on 4 succeeded in 1 usecs
kernel: [ 612.921464] [drm] ring test on 5 succeeded in 2 usecs
kernel: [ 612.921472] [drm] UVD initialized successfully.
kernel: [ 612.921539] [drm] ib test on ring 0 succeeded in 0 usecs
kernel: [ 612.921634] [drm] ib test on ring 1 succeeded in 0 usecs
kernel: [ 612.921722] [drm] ib test on ring 2 succeeded in 0 usecs
kernel: [ 612.921762] [drm] ib test on ring 3 succeeded in 0 usecs
kernel: [ 612.921796] [drm] ib test on ring 4 succeeded in 0 usecs
kernel: [ 623.068910] radeon 0000:01:00.0: ring 5 stalled for more than
10000msec
kernel: [ 623.068921] radeon 0000:01:00.0: GPU lockup (waiting for
0x0000000000000004 last fence id 0x0000000000000002 on ring 5)
kernel: [ 623.068927] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait
failed (-35).
kernel: [ 623.068935] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed
testing IB on ring 5 (-35).
kernel: [ 623.098333] radeon 0000:01:00.0: GPU fault detected: 146 0x07a23d0c
kernel: [ 623.098342] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000BDBD
kernel: [ 623.098347] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0203D00C
kernel: [ 623.098352] VM fault (0x0c, vmid 1) at page 48573, read from DMA1
(61)
kernel: [ 623.098364] radeon 0000:01:00.0: GPU fault detected: 146 0x07c23d0c
kernel: [ 623.098368] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: [ 623.098372] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0208400C
kernel: [ 623.098377] VM fault (0x0c, vmid 1) at page 0, read from TC (132)
kernel: [ 623.098383] radeon 0000:01:00.0: GPU fault detected: 146 0x07e23d0c
kernel: [ 623.098387] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000BDBC
kernel: [ 623.098391] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0200800C
kernel: [ 623.098395] VM fault (0x0c, vmid 1) at page 48572, read from TC (8)
kernel: [ 623.128770] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
kernel: [ 623.128781] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000BDB0
kernel: [ 623.128787] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0303D014
kernel: [ 623.128793] VM fault (0x04, vmid 1) at page 48560, write from DMA1
(61)
kernel: [ 623.128820] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
kernel: [ 623.128825] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: [ 623.128830] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0204400C
kernel: [ 623.128835] VM fault (0x0c, vmid 1) at page 0, read from TC (68)
kernel: [ 623.128842] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
kernel: [ 623.128847] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000BDB8
kernel: [ 623.128852] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0204400C
kernel: [ 623.128857] VM fault (0x0c, vmid 1) at page 48568, read from TC
(68)
kernel: [ 623.129932] radeon 0000:01:00.0: GPU fault detected: 146 0x06033d14
kernel: [ 623.129940] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000BDB0
kernel: [ 623.129944] radeon 0000:01:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0303D014
kernel: [ 623.129948] VM fault (0x04, vmid 1) at page 48560, write from DMA1
(61)
kernel: [ 623.129965] radeon 0000:01:00.0: GPU fault detected: 146 0x06233d14
===cut===
Note: several megabytes of similar "VM fault" flood skipped.
--
You are receiving this mail because:
You are watching the assignee of the bug.
More information about the dri-devel
mailing list