[Bug 106342] New: [drm] HANG: ecode 9:0:0x9cba0f27, in kscreenlocker_g [103585], reason: Hang on rcs0, action: reset

Wed May 2 02:36:25 UTC 2018

https://bugs.freedesktop.org/show_bug.cgi?id=106342

            Bug ID: 106342
           Summary: [drm] HANG: ecode 9:0:0x9cba0f27, in kscreenlocker_g
                    [103585], reason: Hang on rcs0, action: reset
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: All
            Status: NEW
          Severity: major
          Priority: medium
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: thiago at kde.org
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org

Created attachment 139259
  --> https://bugs.freedesktop.org/attachment.cgi?id=139259&action=edit
card0_error 2018-05-02

Possibly related to Bug 101991 (which I reported), bug 104545 (which says was
fixed by the same commit).

Bug 101991 was about a GPU hang after resuming from hibernation. That is still
the problem I am having: after a few cycles of suspend-to-disk (hibernate) and
resume, I get a GPU hang soon after resuming, if not immediately after.

Bug 101991 was reportedly fixed by SKL DMC 1.27, which is what I am now using
(kernel 4.16.3):

[    4.106911] [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin
(v1.27)

Unlike Bug 101991, the screen is still responsive after hang, not frozen. But
many OpenGL workloads stop working, to the point that desktop is unusable due
to EIO errors happening. It's just good enough for me to cleanly reboot, as
opposed to forcing it via Alt+SysRq. Applications are not actually crashing (no
coredump created), but appear to be exiting with error by something inside
Mesa.

dmesg log:
[217047.398083] [drm] GPU HANG: ecode 9:0:0x9cba0f27, in kscreenlocker_g
[103585], reason: Hang on rcs0, action: reset
[217047.398085] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
[217047.398085] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI -> DRM/Intel
[217047.398086] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[217047.398086] [drm] The gpu crash dump is required to analyze gpu hangs, so
please always attach it.
[217047.398087] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[217047.398104] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[217048.617889] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout
[217048.617933] i915 0000:00:02.0: Resetting chip after gpu hang
[217049.833883] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout
[217051.160111] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout
[217052.482897] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout
[217052.589836] i915 0000:00:02.0: Failed to reset chip

Attached the card0/error file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180502/300338bb/attachment-0001.html>