[Bug 111385] (Only partly recoverable) GPU hangs in (multi-context) SynMark HDRBloom with Iris driver

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Sep 6 11:54:45 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111385

Eero Tamminen <eero.t.tamminen at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Product|Mesa                        |DRI
             Blocks|111444                      |
           Assignee|intel-3d-bugs at lists.freedes |intel-gfx-bugs at lists.freede
                   |ktop.org                    |sktop.org
            Summary|[GEN9] (partly recoverable) |(Only partly recoverable)
                   |GPU hang in (multi-context) |GPU hangs in
                   |SynMark HDRBloom            |(multi-context) SynMark
                   |                            |HDRBloom with Iris driver
          Component|Drivers/Gallium/Iris        |DRM/Intel
         QA Contact|intel-3d-bugs at lists.freedes |intel-gfx-bugs at lists.freede
                   |ktop.org                    |sktop.org
                 CC|                            |intel-gfx-bugs at lists.freede
                   |                            |sktop.org
            Version|git                         |DRI git
             Status|NEEDINFO                    |NEW

--- Comment #12 from Eero Tamminen <eero.t.tamminen at intel.com> ---
Last night there was a system hang in HdrBloom with Iris on BDW GT2 when using
latest Mesa and drm-tip kernel git version -> issue isn't GEN9(+) specific.


(In reply to Mark Janes from comment #10)
> I tried to reproduce this, and failed.
> Can you see if it reproduces with a stock kernel?

I'm not sure what you mean by "stock" kernel.

I tested with Ubuntu 18.04 HWE kernel 5.0.0 on SkullCanyon, and wasn't able to
reproduce the GPU hang with 40x repeats, so it seems to require newer kernel.


drm-tip kernel bisect:

* v5.1 (from early May), with rest being latest: not able to reproduce within
20x rounds

* v5.2-rc3 (from early June): not able to reproduce within 20x rounds

* v5.2 (1804 from early July): hang within 5x rounds

* v5.2-rc6 (1790): hang within 5x rounds

* v5.2-rc4 (1780): not able to reproduce within 20x rounds

* v5.2-rc5 (1785): not able to reproduce within 20x rounds

* v5.2-rc5 (1788): hang within 15x rounds

* v5.2-rc5 (1786): not able to reproduce within 20x rounds (+ 5x Multithread)

-> drm-tip v5.2-rc5 or newer kernel needed to reproduce

(Numbers in parenthesis above are our build IDs, build 1787 was for some reason
rc4 with with week earlier commit date than expected.)


Bisect rest:

* On drm-tip v5.2 & Iris from early July: hang within 5x rounds

-> modifier support & newer Iris changes not needed for triggering

* On whole gfx stack from early July: hang within 10x rounds

* On whole gfx stack from early July, but Iris from early May: hang within 10x
rounds

* On drm-tip v5.2, with rest of stack for early May: hang within 10x rounds

-> I.e. this is actually drm-tip/kernel regression, not Mesa one (as already
hinted by recovery sometimes failing). Moving to drm component.

Hangs seem to have started somewhere between following drm-tip commits:
* 1180972dbd2a00f60a4d707772bd7e7ae6732ed5 drm-tip: 2019y-06m-20d-15h-39m-16s
UTC integration manifest
* 7ff7b7a9d09acaa647921780fa5ed3525ab8f278 drm-tip: 2019y-06m-21d-23h-53m-21s
UTC integration manifest


With latest gfx stack, hang seems to be somewhat more likely, than with a
user-space gfx stack from few months ago.


(In reply to Eero Tamminen from comment #2)
> Got a (non-recoverable) HdrBloom hang also with i965 when using latest Git
> gfx stack, on SKL GT4e (SkullCanyon), so this might not be Iris specific
> issue.

This was the only time it happened with HdrBloom using i965 driver, and only
time when i915 error state specified in which process (X) the hangs happens
during HdrBloom running.

I'm not able to reproduce hang on SkullCanyon with v5.2 drm-tip using i965,
with 25x rounds.  No idea why Iris triggers this bug so easily, but not i965. 
Does i915 rely on i965 re/setting some extra state?


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=111444
[Bug 111444] [TRACKER] Mesa 19.2 release tracker
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190906/711593fa/attachment.html>


More information about the intel-gfx-bugs mailing list