[Bug 63564] New: Radeon HD 5870 CP lockup / Stall with OpenGL load

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Apr 15 10:25:59 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=63564

          Priority: medium
            Bug ID: 63564
          Assignee: dri-devel at lists.freedesktop.org
           Summary: Radeon HD 5870 CP lockup / Stall with OpenGL load
          Severity: major
    Classification: Unclassified
                OS: Linux (All)
          Reporter: kallisti5 at unixzen.com
          Hardware: All
            Status: NEW
           Version: git
         Component: Drivers/DRI/Radeon
           Product: Mesa

Playing Team Fortress 2 under ArchLinux with mainline git mesa + xorg 1.14

Get random GPU lockups and stalls. The driver does recover, however recovery
takes around 10 seconds.  dmesg gets the following stall warning on every
stall:

[ 2377.378560] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[ 2377.378568] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000051691
last fence id 0x000000000005168a)
[ 2377.379628] radeon 0000:01:00.0: Saved 279 dwords of commands on ring 0.
[ 2377.379635] radeon 0000:01:00.0: GPU softreset: 0x00000003
[ 2377.389578] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5700828
[ 2377.389581] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x88000003
[ 2377.389583] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xFC000001
[ 2377.389585] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 2377.389586] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 2377.389588] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[ 2377.389590] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048006
[ 2377.389592] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[ 2377.389593] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[ 2377.389645] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[ 2377.389647] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 2377.389648] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 2377.389650] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 2377.389652] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 2377.389653] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 2377.389655] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 2377.389656] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[ 2377.407070] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 2377.474159] [drm] probing gen 2 caps for device 1002:5a16 = 2/0
[ 2377.474161] [drm] PCIE gen 2 link speeds already enabled
[ 2377.477011] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 2377.477093] radeon 0000:01:00.0: WB enabled
[ 2377.477095] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000040000c00 and cpu addr 0xffff8804270e7c00
[ 2377.477097] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000040000c0c and cpu addr 0xffff8804270e7c0c
[ 2377.493165] [drm] ring test on 0 succeeded in 1 usecs
[ 2377.493240] [drm] ring test on 3 succeeded in 1 usecs
[ 2377.506450] [drm] ib test on ring 0 succeeded in 0 usecs
[ 2377.506483] [drm] ib test on ring 3 succeeded in 1 usecs

[ 3733.413723] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[ 3733.413731] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000d3f80
last fence id 0x00000000000d3f7d)
[ 3733.414789] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0.
[ 3733.414796] radeon 0000:01:00.0: GPU softreset: 0x00000003
[ 3733.419144] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5500828
[ 3733.419148] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x88000003
[ 3733.419152] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEC000001
[ 3733.419155] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3733.419159] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3733.419162] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[ 3733.419166] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048004
[ 3733.419169] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[ 3733.419172] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[ 3733.419227] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[ 3733.419230] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 3733.419234] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 3733.419237] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3733.419241] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3733.419244] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 3733.419247] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 3733.419251] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[ 3733.436635] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 3733.503442] [drm] probing gen 2 caps for device 1002:5a16 = 2/0
[ 3733.503444] [drm] PCIE gen 2 link speeds already enabled
[ 3733.505831] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 3733.505909] radeon 0000:01:00.0: WB enabled
[ 3733.505911] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000040000c00 and cpu addr 0xffff8804270e7c00
[ 3733.505913] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000040000c0c and cpu addr 0xffff8804270e7c0c
[ 3733.521972] [drm] ring test on 0 succeeded in 1 usecs
[ 3733.522028] [drm] ring test on 3 succeeded in 1 usecs
[ 3733.534027] [drm] ib test on ring 0 succeeded in 0 usecs
[ 3733.534049] [drm] ib test on ring 3 succeeded in 1 usecs


kallisti5 at eris ~ :) $ sudo lspci -vv -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cypress
LE [Radeon HD 5800 Series] (prog-if 00 [VGA controller])
    Subsystem: XFX Pine Group Inc. Device 3070
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 90
    Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
    Region 2: Memory at fea20000 (64-bit, non-prefetchable) [size=128K]
    Region 4: I/O ports at e000 [size=256]
    Expansion ROM at fea00000 [disabled] [size=128K]
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0
<64ns, L1 <1us
            ClockPM- Surprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not
Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-,
EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000feeff00c  Data: 41e3
    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010
<?>
    Capabilities: [150 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    Kernel driver in use: radeon

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20130415/b13e9392/attachment-0001.html>


More information about the dri-devel mailing list