[Bug 111920] New: NON-GuC constant i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Oct 8 00:26:16 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111920

            Bug ID: 111920
           Summary: NON-GuC constant i915 0000:00:02.0: GPU HANG: ecode
                    9:1:0x00000000, hang on rcs0
           Product: DRI
           Version: DRI git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: high
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: kenny at panix.com
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org
     i915 platform: CFL

Created attachment 145678
  --> https://bugs.freedesktop.org/attachment.cgi?id=145678&action=edit
/sys/class/drm/card0/error

In bug 111085 (https://bugs.freedesktop.org/show_bug.cgi?id=111805)
lakshminarayana.vudum at intel.com asked me to try running without the GuC
enabled. 

I did that, and it's still hanging up. This is the DRM-tip right before commit
c1132367 as that commit prevents my box from going into S0/s2idle suspend (see
bug https://bugs.freedesktop.org/show_bug.cgi?id=111909).

Here's the worst part- if I can wrench control to a VT, I can usually "sudo
systemctl hibernate" to force a power-cycle that unwedges the i915- but THIS
time, right after the resume:

----
Oct  7 17:03:36 hp-x360n systemd-sleep[16719]: System resumed.
Oct  7 17:03:36 hp-x360n systemd[1]: Stopping TLP suspend/resume...
Oct  7 17:03:36 hp-x360n systemd[1]: Stopped TLP suspend/resume.
Oct  7 17:04:40 hp-x360n kernel: [20868.899672] i915 0000:00:02.0: Resetting
rcs0 for hang on rcs0
Oct  7 17:05:16 hp-x360n kernel: [20904.931581] i915 0000:00:02.0: Resetting
rcs0 for hang on rcs0
Oct  7 17:07:04 hp-x360n kernel: [21012.899361] i915 0000:00:02.0: Resetting
rcs0 for hang on rcs0
----

<facepalm>

The latest i915 changes on Sept 26th are really killing my workflow, as I can
never tell when my laptop will just decide to hang up (and I can be doing such
mundane tasks as viewing a webpage or building some software in a konsole- I
don't game and this time I wasn't even watching video).

Is there ANYTHING I can do to help you guys diagnose, mitigate, or warn me when
it's likely to occur? I've posted some 7 .../card0/error files and apparently
there's not enough info in these to help figure out what's going on. Are there
any debug flags (that won't ruin daily-driver performance) that I can try so
when this happens again there's more info?

(Is there any way to just hack out a merge from a GIT tree?)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20191008/5f8c1827/attachment-0001.html>


More information about the intel-gfx-bugs mailing list