[Bug 77001] New: Radeon R9 270X GPU lockup and resume failure after all night inactivity

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Wed May 28 05:50:29 PDT 2014


https://bugzilla.kernel.org/show_bug.cgi?id=77001

            Bug ID: 77001
           Summary: Radeon R9 270X GPU lockup and resume failure after all
                    night inactivity
           Product: Drivers
           Version: 2.5
    Kernel Version: 3.14.4
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: custos.mentis at gmail.com
        Regression: No

Created attachment 137581
  --> https://bugzilla.kernel.org/attachment.cgi?id=137581&action=edit
kernel log with the lockup and following boot messages

After leaving the computer on during the night it hung up in the morning, while
I tried to use it again, with the following message:

------
May 28 06:13:06 [kernel] [153149.666146] radeon 0000:01:00.0: GPU lockup CP
stall for more than 10033msec
May 28 06:13:06 [kernel] [153149.666150] radeon 0000:01:00.0: GPU lockup
(waiting for 0x000000000071debe last fence id 0x000000000071debd on ring 0)
May 28 06:13:06 [kernel] [153150.122657] radeon 0000:01:00.0: GPU lockup CP
stall for more than 10490msec
May 28 06:13:06 [kernel] [153150.122661] radeon 0000:01:00.0: GPU lockup
(waiting for 0x000000000071debe last fence id 0x000000000071debd on ring 0)
May 28 06:13:06 [kernel] [153150.122664] radeon 0000:01:00.0: failed to get a
new IB (-35)
May 28 06:13:06 [kernel] [153150.124014] radeon 0000:01:00.0: sa_manager is not
empty, clearing anyway
May 28 06:13:07 [kernel] [153150.927575] radeon 0000:01:00.0: Saved 3296 dwords
of commands on ring 0.
May 28 06:13:07 [kernel] [153150.927713] radeon 0000:01:00.0: GPU softreset:
0x0000004D
May 28 06:13:07 [kernel] [153150.927721] radeon 0000:01:00.0:   GRBM_STATUS    
          = 0xA3503028
May 28 06:13:07 [kernel] [153150.927726] radeon 0000:01:00.0:   GRBM_STATUS_SE0
          = 0x28000006
May 28 06:13:07 [kernel] [153150.927732] radeon 0000:01:00.0:   GRBM_STATUS_SE1
          = 0x2D000006
May 28 06:13:07 [kernel] [153150.927736] radeon 0000:01:00.0:   SRBM_STATUS    
          = 0x20000EC0
May 28 06:13:07 [kernel] [153150.927850] radeon 0000:01:00.0:   SRBM_STATUS2   
          = 0x00000000
May 28 06:13:07 [kernel] [153150.927854] radeon 0000:01:00.0:  
R_008674_CP_STALLED_STAT1 = 0x00000000
May 28 06:13:07 [kernel] [153150.927859] radeon 0000:01:00.0:  
R_008678_CP_STALLED_STAT2 = 0x00004100
May 28 06:13:07 [kernel] [153150.927863] radeon 0000:01:00.0:  
R_00867C_CP_BUSY_STAT     = 0x00028986
May 28 06:13:07 [kernel] [153150.927867] radeon 0000:01:00.0:  
R_008680_CP_STAT          = 0x800282E7
May 28 06:13:07 [kernel] [153150.927874] radeon 0000:01:00.0:  
R_00D034_DMA_STATUS_REG   = 0x44483146
May 28 06:13:07 [kernel] [153150.927878] radeon 0000:01:00.0:  
R_00D834_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:07 [kernel] [153150.927883] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
May 28 06:13:07 [kernel] [153150.927887] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
May 28 06:13:08 [kernel] [153152.248694] radeon 0000:01:00.0: Wait for MC idle
timedout !
May 28 06:13:08 [kernel] [153152.248703] radeon 0000:01:00.0:
GRBM_SOFT_RESET=0x0000DDFF
May 28 06:13:08 [kernel] [153152.248760] radeon 0000:01:00.0:
SRBM_SOFT_RESET=0x00100100
May 28 06:13:08 [kernel] [153152.249931] radeon 0000:01:00.0:   GRBM_STATUS    
          = 0x00003028
May 28 06:13:08 [kernel] [153152.249937] radeon 0000:01:00.0:   GRBM_STATUS_SE0
          = 0x00000006
May 28 06:13:08 [kernel] [153152.249941] radeon 0000:01:00.0:   GRBM_STATUS_SE1
          = 0x00000006
May 28 06:13:08 [kernel] [153152.249945] radeon 0000:01:00.0:   SRBM_STATUS    
          = 0x20000EC0
May 28 06:13:08 [kernel] [153152.250059] radeon 0000:01:00.0:   SRBM_STATUS2   
          = 0x00000000
May 28 06:13:08 [kernel] [153152.250088] radeon 0000:01:00.0:  
R_008674_CP_STALLED_STAT1 = 0x00000000
May 28 06:13:08 [kernel] [153152.250092] radeon 0000:01:00.0:  
R_008678_CP_STALLED_STAT2 = 0x00000000
May 28 06:13:08 [kernel] [153152.250099] radeon 0000:01:00.0:  
R_00867C_CP_BUSY_STAT     = 0x00000000
May 28 06:13:08 [kernel] [153152.250109] radeon 0000:01:00.0:  
R_008680_CP_STAT          = 0x00000000
May 28 06:13:08 [kernel] [153152.250114] radeon 0000:01:00.0:  
R_00D034_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:08 [kernel] [153152.250118] radeon 0000:01:00.0:  
R_00D834_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:08 [kernel] [153152.250372] radeon 0000:01:00.0: GPU reset
succeeded, trying to resume
May 28 06:13:13 [kernel] [153157.261944] [drm:atom_op_jump] *ERROR* atombios
stuck in loop for more than 5secs aborting
May 28 06:13:13 [kernel] [153157.261952] [drm:atom_execute_table_locked]
*ERROR* atombios stuck executing C008 (len 254, WS 0, PS 4) @ 0xC032
May 28 06:13:13 [kernel] [153157.261957] [drm:atom_execute_table_locked]
*ERROR* atombios stuck executing B67A (len 94, WS 12, PS 8) @ 0xB6C3
May 28 06:13:13 [kernel] [153157.276909] [drm] probing gen 2 caps for device
1002:5a16 = 33ed02/0
May 28 06:13:13 [kernel] [153157.276918] [drm] PCIE gen 2 link speeds already
enabled
May 28 06:13:14 [kernel] [153157.761994] radeon 0000:01:00.0: Wait for MC idle
timedout !
May 28 06:13:14 [kernel] [153157.982771] radeon 0000:01:00.0: Wait for MC idle
timedout !
May 28 06:13:14 [kernel] [153157.988649] [drm] PCIE GART of 1024M enabled
(table at 0x0000000000276000).
May 28 06:13:14 [kernel] [153157.988769] radeon 0000:01:00.0: WB enabled
May 28 06:13:14 [kernel] [153157.988775] radeon 0000:01:00.0: fence driver on
ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800ba368c00
May 28 06:13:14 [kernel] [153157.988780] radeon 0000:01:00.0: fence driver on
ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800ba368c04
May 28 06:13:14 [kernel] [153157.988785] radeon 0000:01:00.0: fence driver on
ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800ba368c08
May 28 06:13:14 [kernel] [153157.988792] radeon 0000:01:00.0: fence driver on
ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800ba368c0c
May 28 06:13:14 [kernel] [153157.988798] radeon 0000:01:00.0: fence driver on
ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800ba368c10
May 28 06:13:14 [kernel] [153157.989912] radeon 0000:01:00.0: fence driver on
ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90010335a18
May 28 06:13:15 [kernel] [153158.516551] [drm:r600_ring_test] *ERROR* radeon:
ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
May 28 06:13:15 [kernel] [153158.516557] [drm:si_resume] *ERROR* si startup
failed on resume
May 28 06:13:15 [kernel] [153158.516629] [drm:radeon_pm_resume_dpm] *ERROR*
radeon: dpm resume failed
May 28 06:13:16 [kernel] [153159.392111] radeon 0000:01:00.0: Saved 9824 dwords
of commands on ring 0.
May 28 06:13:16 [kernel] [153159.392244] radeon 0000:01:00.0: GPU softreset:
0x00000048
May 28 06:13:16 [kernel] [153159.392246] radeon 0000:01:00.0:   GRBM_STATUS    
          = 0xA0003028
May 28 06:13:16 [kernel] [153159.392249] radeon 0000:01:00.0:   GRBM_STATUS_SE0
          = 0x00000006
May 28 06:13:16 [kernel] [153159.392251] radeon 0000:01:00.0:   GRBM_STATUS_SE1
          = 0x00000006
May 28 06:13:16 [kernel] [153159.392253] radeon 0000:01:00.0:   SRBM_STATUS    
          = 0x20000EC0
May 28 06:13:16 [kernel] [153159.392363] radeon 0000:01:00.0:   SRBM_STATUS2   
          = 0x00000000
May 28 06:13:16 [kernel] [153159.392365] radeon 0000:01:00.0:  
R_008674_CP_STALLED_STAT1 = 0x00000000
May 28 06:13:16 [kernel] [153159.392367] radeon 0000:01:00.0:  
R_008678_CP_STALLED_STAT2 = 0x00010100
May 28 06:13:16 [kernel] [153159.392369] radeon 0000:01:00.0:  
R_00867C_CP_BUSY_STAT     = 0x00420182
May 28 06:13:16 [kernel] [153159.392371] radeon 0000:01:00.0:  
R_008680_CP_STAT          = 0x84038243
May 28 06:13:16 [kernel] [153159.392375] radeon 0000:01:00.0:  
R_00D034_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:16 [kernel] [153159.392377] radeon 0000:01:00.0:  
R_00D834_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:16 [kernel] [153159.392380] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
May 28 06:13:16 [kernel] [153159.392383] radeon 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
May 28 06:13:17 [kernel] [153160.661846] radeon 0000:01:00.0: Wait for MC idle
timedout !
May 28 06:13:17 [kernel] [153160.661850] radeon 0000:01:00.0:
GRBM_SOFT_RESET=0x0000DDFF
May 28 06:13:17 [kernel] [153160.661904] radeon 0000:01:00.0:
SRBM_SOFT_RESET=0x00000100
May 28 06:13:17 [kernel] [153160.663060] radeon 0000:01:00.0:   GRBM_STATUS    
          = 0x00003028
May 28 06:13:17 [kernel] [153160.663062] radeon 0000:01:00.0:   GRBM_STATUS_SE0
          = 0x00000006
May 28 06:13:17 [kernel] [153160.663067] radeon 0000:01:00.0:   GRBM_STATUS_SE1
          = 0x00000006
May 28 06:13:17 [kernel] [153160.663075] radeon 0000:01:00.0:   SRBM_STATUS    
          = 0x20000EC0
May 28 06:13:17 [kernel] [153160.663186] radeon 0000:01:00.0:   SRBM_STATUS2   
          = 0x00000000
May 28 06:13:17 [kernel] [153160.663193] radeon 0000:01:00.0:  
R_008674_CP_STALLED_STAT1 = 0x00000000
May 28 06:13:17 [kernel] [153160.663197] radeon 0000:01:00.0:  
R_008678_CP_STALLED_STAT2 = 0x00000000
May 28 06:13:17 [kernel] [153160.663199] radeon 0000:01:00.0:  
R_00867C_CP_BUSY_STAT     = 0x00000000
May 28 06:13:17 [kernel] [153160.663200] radeon 0000:01:00.0:  
R_008680_CP_STAT          = 0x00000000
May 28 06:13:17 [kernel] [153160.663207] radeon 0000:01:00.0:  
R_00D034_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:17 [kernel] [153160.663209] radeon 0000:01:00.0:  
R_00D834_DMA_STATUS_REG   = 0x44C83D57
May 28 06:13:17 [kernel] [153160.663478] radeon 0000:01:00.0: GPU reset
succeeded, trying to resume
------

After that it didn't respond anymore, not even through ssh, so hard reset was
required.

I've noticed that simply pressing the reset button is not enough, and hard
reset with powering off the computer is necessary.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list