[Bug 78221] 3.16 RC1: AMD R9 270 GPU locks up on some heavy 2D activity - GPU VM fault occurs. (possibly DMA copying issue strikes back?)

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Mon Sep 8 05:19:50 PDT 2014


https://bugzilla.kernel.org/show_bug.cgi?id=78221

--- Comment #20 from t3st3r at mail.ru ---
1) About 3.15 + patch: I gave it a try and it took quite a while to get opinion
about it. Overall it is quite stable and survives about several days of run of
problematic load. But eventually GPU still could encounter crash. Intereating
thing in this occurence I caught is that regardless of scary message about
failed DPM resume, GPU seems to be operable after successful recovery. I got
couple of similar crashes as well within a week. It looked like this:

===cut===
[815114.959250] SysRq : Emergency Sync
[815115.071974] Emergency Sync complete
[815116.935547] radeon 0000:01:00.0: ring 0 stalled for more than 10082msec
[815116.935556] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000007f39f60
last fence id 0x0000000007f39f5f on ring 0)
[815116.935564] radeon 0000:01:00.0: failed to get a new IB (-35)
[815116.942472] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[815117.134467] SysRq : Keyboard mode set to system default
[815117.500079] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080406640 flags=0x0000]
[815117.500092] radeon 0000:01:00.0: Saved 6061 dwords of commands on ring 0.
[815117.500097] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080406650 flags=0x0020]
[815117.500104] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080000100 flags=0x0020]
[815117.500110] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000080404500 flags=0x0000]
[815117.500222] radeon 0000:01:00.0: GPU softreset: 0x0000006C
[815117.500226] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003028
[815117.500229] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[815117.500231] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[815117.500233] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200002C0
[815117.500349] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[815117.500351] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[815117.500353] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010000
[815117.500356] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
[815117.500358] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80010243
[815117.500360] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44483106
[815117.500362] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C84246
[815117.500365] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[815117.500368] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[815118.057253] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[815118.057308] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100140
[815118.058465] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[815118.058468] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[815118.058470] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[815118.058472] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[815118.058583] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[815118.058585] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[815118.058588] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[815118.058590] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[815118.058592] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[815118.058594] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[815118.058597] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[815118.058843] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[815118.086936] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[815118.086939] [drm] PCIE gen 2 link speeds already enabled
[815118.090599] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000).
[815118.090704] radeon 0000:01:00.0: WB enabled
[815118.090707] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000080000c00 and cpu addr 0xffff880414545c00
[815118.090709] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr
0x0000000080000c04 and cpu addr 0xffff880414545c04
[815118.090711] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr
0x0000000080000c08 and cpu addr 0xffff880414545c08
[815118.090713] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000080000c0c and cpu addr 0xffff880414545c0c
[815118.090715] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr
0x0000000080000c10 and cpu addr 0xffff880414545c10
[815118.091689] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr
0x0000000000075a18 and cpu addr 0xffffc90012135a18
[815118.278813] [drm] ring test on 0 succeeded in 3 usecs
[815118.278819] [drm] ring test on 1 succeeded in 1 usecs
[815118.278824] [drm] ring test on 2 succeeded in 1 usecs
[815118.278888] [drm] ring test on 3 succeeded in 2 usecs
[815118.278897] [drm] ring test on 4 succeeded in 1 usecs
[815118.455982] [drm] ring test on 5 succeeded in 2 usecs
[815118.455989] [drm] UVD initialized successfully.
[815128.453467] radeon 0000:01:00.0: ring 0 stalled for more than 10001msec
[815128.453477] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000007f39fad
last fence id 0x0000000007f39f5f on ring 0)
[815128.453483] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[815128.453491] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on
GFX ring (-35).
[815128.453496] radeon 0000:01:00.0: ib ring test failed (-35).
[815129.011900] radeon 0000:01:00.0: GPU softreset: 0x00000048
[815129.011904] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003028
[815129.011907] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[815129.011909] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[815129.011911] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[815129.012022] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[815129.012025] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[815129.012027] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010000
[815129.012029] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
[815129.012031] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80010243
[815129.012034] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[815129.012036] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[815129.012039] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[815129.012041] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[815129.561916] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[815129.561971] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[815129.563128] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[815129.563131] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[815129.563133] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[815129.563135] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[815129.563246] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[815129.563249] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[815129.563251] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[815129.563253] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[815129.563255] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[815129.563257] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[815129.563260] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[815129.563506] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[815129.576411] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[815129.576415] [drm] PCIE gen 2 link speeds already enabled
[815129.580147] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000).
[815129.580250] radeon 0000:01:00.0: WB enabled
[815129.580253] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000080000c00 and cpu addr 0xffff880414545c00
[815129.580255] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr
0x0000000080000c04 and cpu addr 0xffff880414545c04
[815129.580257] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr
0x0000000080000c08 and cpu addr 0xffff880414545c08
[815129.580259] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000080000c0c and cpu addr 0xffff880414545c0c
[815129.580261] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr
0x0000000080000c10 and cpu addr 0xffff880414545c10
[815129.581232] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr
0x0000000000075a18 and cpu addr 0xffffc90012135a18
[815129.767993] [drm] ring test on 0 succeeded in 3 usecs
[815129.767999] [drm] ring test on 1 succeeded in 1 usecs
[815129.768004] [drm] ring test on 2 succeeded in 1 usecs
[815129.768068] [drm] ring test on 3 succeeded in 2 usecs
[815129.768077] [drm] ring test on 4 succeeded in 1 usecs
[815129.945157] [drm] ring test on 5 succeeded in 2 usecs
[815129.945164] [drm] UVD initialized successfully.
[815129.946125] [drm] ib test on ring 0 succeeded in 0 usecs
[815129.946210] [drm] ib test on ring 1 succeeded in 0 usecs
[815129.946301] [drm] ib test on ring 2 succeeded in 0 usecs
[815129.946345] [drm] ib test on ring 3 succeeded in 0 usecs
[815129.946380] [drm] ib test on ring 4 succeeded in 0 usecs
[815137.847012] SysRq : Emergency Sync
[815137.965713] Emergency Sync complete
[815139.742325] SysRq : Emergency Sync
[815139.864190] Emergency Sync complete
[815140.093163] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec
[815140.093173] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004
last fence id 0x0000000000000002 on ring 5)
[815140.093179] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35).
[815140.093188] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on
ring 5 (-35).
[815140.093217] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
===cut===

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list