<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - GPU HANG: ecode 9:0:0x86dfbff9, in Map-GL [6016], reason: Hang on render ring, action: reset"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111395#c27">Comment # 27</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - GPU HANG: ecode 9:0:0x86dfbff9, in Map-GL [6016], reason: Hang on render ring, action: reset"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111395">bug 111395</a>
from <span class="vcard"><a class="email" href="mailto:kenneth@whitecape.org" title="Kenneth Graunke <kenneth@whitecape.org>"> <span class="fn">Kenneth Graunke</span></a>
</span></b>
<pre>(In reply to yugang from <a href="show_bug.cgi?id=111395#c25">comment #25</a>)
<span class="quote">> hi Kenneth,
>
> could you help check latest two hang error code(also attached two decode
> files) if they also have the similar issue in batch buffer/ring buffer as
> before(e.g. underallocation with random content)?
>
> this serious impacts the customer's productions, and so urgent to feedback
> to customer. thank you</span >
0xefdfffff.i915_error_state.txt is total garbage once again - the batch is just
obliterated by something scribbling all over memory. Random ecodes makes
sense, as that value is produced based on the INSTDONE bits and IPEHR (hanging
instruction) - which, when your hanging instruction is some random garbage -
tends to produce random ecodes.
0x84df7cfc.i915_error_state.02.txt looks more promising, it appears that it has
an actual batch. However, I'm not seeing anything amiss right away. The
hardware context is again all zeroes, but that may just be an error capture bug
in that old kernel (perhaps
<a class="bz_bug_link
bz_status_NEW "
title="NEW - Invalid data in error state"
href="show_bug.cgi?id=107691">https://bugs.freedesktop.org/show_bug.cgi?id=107691</a>), and not part of the
actual problem. It's really difficult to work with these old logs, there is
just a ton of information missing. We started capturing a lot more information
with Linux v4.14 and newer Mesa, but that isn't really an option here...
One random idea. It looks like in that log, 3DSTATE_CONSTANT_VS has a pointer
of 0xfdd5b900, which looks like a real memory address and not an offset from
Dynamic State Base Address. Which means that Mesa must be setting
CS_DEBUG_MODE2 to make that an absolute address instead an offset. At one
point, vaapi-intel-driver didn't program CS_DEBUG_MODE2 and expected it to be
an offset. The kernel also didn't isolate contexts from each other until
d2b4b97933f5adacfba42dc3b9200d0e21fbe2c4, so sometimes a media process would
inherit state from another context, where that mode was flipped, and repeatedly
hang. The kernel getparam I915_PARAM_HAS_CONTEXT_ISOLATION is supposed to
control that. I guess you must have that in your 4.9.x backport, though?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>