[Bug 71864] New: [RS690] GPU Lockup CP Stall and Resulting Kernel Oops (Kernel 3.2.0)

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Nov 21 00:42:47 PST 2013


https://bugs.freedesktop.org/show_bug.cgi?id=71864

          Priority: medium
            Bug ID: 71864
          Assignee: dri-devel at lists.freedesktop.org
           Summary: [RS690] GPU Lockup CP Stall and Resulting Kernel Oops
                    (Kernel 3.2.0)
          Severity: critical
    Classification: Unclassified
                OS: Linux (All)
          Reporter: reimth at gmail.com
          Hardware: x86-64 (AMD64)
            Status: NEW
           Version: XOrg CVS
         Component: DRM/Radeon
           Product: DRI

[Problem]
Since upgrade from kernel 2.6.35 to kernel 3.2.0 (Ubuntu 12.04) we experience
numerous kernel freezes (no keyboard/mouse, no kernel logging, no num key
change, stop of server applications, e. g. dhcpd, postfix, bind9, magic keys
not working, no serial console), that can only be resolved by switching power
off (hard reset). There is no clear way to reproduce this bug. The likelihood
of the kernel crash increases, if mail GUIs like Evolution or Thunderbird are
open or Firefox is open and when switching between these windows. The kernel
freezes latest within 6 h.  Whereas if just the desktop and xterm is running,
the system seems to be stable (48 h and more). The bug can be confirmed under
lightdm/unity as desktop, as well as when using mdm/cinnamon. Log information
can only be retrieved using netconsole. They show a GPU lockup and CP stall,
from which the radeon driver cannot recover.

[Configuration Specifics]
We run two X Servers: one controlled by the display manager (on vt7 or vt8) and
one controlled by xinit (on vt9). See attached process list.

We use the DVI port of the integrated Radeon X1200 (RS690) display controller
on an ASUS MSA mainboard. One monitor (Samsung) is connected. Radeon driver
with KMS enabled is used.

[Netconsole Output]
The last kernel log messages that reach the netconsole receiver vary:
a) The shortest log
[212173.596044] radeon 0000:01:05.0: GPU lockup CP stall for more than
10000msec
[212175.370234] radeon 0000:01:05.0: failed to reset GPU
[212175.406899] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(3).
[212175.406912] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !

b) A longer one
[286295.708052] radeon 0000:01:05.0: GPU lockup CP stall for more than
10020msec
[286297.455900] radeon 0000:01:05.0: failed to reset GPU
[286297.929150] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(14).
[286297.929174] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[286297.937321] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(15).
[286297.937349] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[286297.943050] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(0).
[286297.943074] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[286297.947188] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(1).
[286297.947213] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[286297.949490] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(2).
[286297.949509] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !

c) GPU reset attempt
[179005.128038] radeon 0000:01:05.0: GPU lockup CP stall for more than
10032msec
[179005.128068] GPU lockup (waiting for 0x0004E1FF last fence id 0x0004E1F4)
[179005.268649] radeon: wait for empty RBBM fifo failed ! Bad things might
happen.
[179005.409044] Failed to wait GUI idle while programming pipes. Bad things
might happen.
[179005.410064] radeon 0000:01:05.0: (rs600_asic_reset:348)
RBBM_STATUS=0x9401C100
[179005.908155] radeon 0000:01:05.0: (rs600_asic_reset:367)
RBBM_STATUS=0x9401C100
[179006.405224] radeon 0000:01:05.0: (rs600_asic_reset:375)
RBBM_STATUS=0x9400C100
[179006.902280] radeon 0000:01:05.0: (rs600_asic_reset:383)
RBBM_STATUS=0x9400C100
[179006.902315] radeon 0000:01:05.0: restoring config space at offset 0x1 (was
0x100403, writing 0x100407)
[179006.902346] radeon 0000:01:05.0: failed to reset GPU
[179006.903346] radeon 0000:01:05.0: GPU reset failed

d) Successful GPU reset but inaccessible CP
[ 1775.356043] radeon 0000:01:05.0: GPU lockup CP stall for more than 10008msec
[ 1775.356067] GPU lockup (waiting for 0x000124ED last fence id 0x000124EA)
[ 1775.919383] radeon: wait for empty RBBM fifo failed ! Bad things might
happen.
[ 1776.059845] Failed to wait GUI idle while programming pipes. Bad things
might happen.
[ 1776.060872] radeon 0000:01:05.0: (rs600_asic_reset:348)
RBBM_STATUS=0xB001C100
[ 1776.559021] radeon 0000:01:05.0: (rs600_asic_reset:367)
RBBM_STATUS=0x90010140
[ 1777.056092] radeon 0000:01:05.0: (rs600_asic_reset:375)
RBBM_STATUS=0x10000140
[ 1777.553160] radeon 0000:01:05.0: (rs600_asic_reset:383)
RBBM_STATUS=0x10000140
[ 1777.553197] radeon 0000:01:05.0: restoring config space at offset 0x1 (was
0x100403, writing 0x100407)
[ 1777.553232] radeon 0000:01:05.0: GPU reset succeed
[ 1777.554232] radeon 0000:01:05.0: GPU reset succeed
[ 1777.554263] sched: RT throttling activated
[ 1777.749090] [drm] radeon: 1 quad pipes, 1 z pipes initialized.
[ 1777.754590] [drm] PCIE GART of 512M enabled (table at 0x0000000036700000).
[ 1777.754958] radeon 0000:01:05.0: WB enabled
[ 1777.754991] [drm] radeon: ring at 0x0000000080001000
[ 1777.892767] [drm:r100_ring_test] *ERROR* radeon: ring test failed
(scratch(0x15E4)=0xCAFEDEAD)
[ 1777.892774] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
[ 1777.892783] radeon 0000:01:05.0: failed initializing CP (-22).
[ 1786.390793] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(11).
[ 1786.390818] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !

A really verbose log with drm.debug set to 0xf has been attached. As well as
the usually required information.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20131121/bcc7b588/attachment.html>


More information about the dri-devel mailing list