[Bug 110131] [GEN9] random/rare GPU hangs in tessellation tests

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Mar 15 12:02:28 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110131

            Bug ID: 110131
           Summary: [GEN9] random/rare GPU hangs in tessellation tests
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/DRI/i965
          Assignee: intel-3d-bugs at lists.freedesktop.org
          Reporter: eero.t.tamminen at intel.com
        QA Contact: intel-3d-bugs at lists.freedesktop.org

Created attachment 143679
  --> https://bugs.freedesktop.org/attachment.cgi?id=143679&action=edit
SKL GT4e SynMark OglTerrainFlyTess hang error state

Setup:
* Ubuntu 18.04
* v5.0+ drm-tip kernel & git version of Xserver
* Mesa git version

In last few days I've seen couple of random GPU hangs in tessellation related
tests:

- on SKL GT4e, recoverable one once in SynMark2 v7 OglTerrainFlyTess, and once
in GfxBench v5 GL Aztec Ruins normal (does also lot of other things besides
tessellation)
- one system hang in GfxBench tessellation test on KBL GT2 day before

It's possible that first item is related to starting to use Weston/XWayland
instead of normal X:
----------------------------------------------------
[ 8231.866172] i915 0000:00:02.0: GPU HANG: ecode 9:1:0xfffffffe, in  [0], hang
on rcs0
[ 8231.866174] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
[ 8231.866174] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI -> DRM/Intel
[ 8231.866175] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[ 8231.866175] [drm] The gpu crash dump is required to analyze gpu hangs, so
please always attach it.
[ 8231.866175] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 8231.867183] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[ 8239.858844] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[ 8243.313841] Asynchronous wait on fence i915:weston[643]/1:5eb9e timed out
(hint:intel_atomic_commit_ready+0x0/0x54 [i915])
[ 8247.858844] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
----------------------------------------------------

See attachement for error state.

(Another possibility could be that those started (to be more visible?) after
the "intel/nir: Vectorize all IO" fix to bug 107510, as that improved
tessellation tests.)

Note: I don't actively read dmesg output, so I may have missed most of the
recoverable GPU hangs unless they've been serious enough to hang the system,
fail the test, or at least slow it down enough to significantly impact
performance.  I'll add some better tracking for that.

No idea whether these are related to compute hangs bug 108820, or Heaven hang
bug 103556.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20190315/381f0b2c/attachment.html>


More information about the intel-3d-bugs mailing list