<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [BAT][BDW] WARN_ON(!intel_engines_are_idle(dev_priv)) in i915_gem_suspend+0x123/0x140"
href="https://bugs.freedesktop.org/show_bug.cgi?id=101891#c1">Comment # 1</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [BAT][BDW] WARN_ON(!intel_engines_are_idle(dev_priv)) in i915_gem_suspend+0x123/0x140"
href="https://bugs.freedesktop.org/show_bug.cgi?id=101891">bug 101891</a>
from <span class="vcard"><a class="email" href="mailto:chris@chris-wilson.co.uk" title="Chris Wilson <chris@chris-wilson.co.uk>"> <span class="fn">Chris Wilson</span></a>
</span></b>
<pre>It's just one of those impossible conditions that should never fire. The
sequence is this
/* As the idle_work is rearming if it detects a race, play safe and
* repeat the flush until it is definitely idle.
*/
while (flush_delayed_work(&dev_priv->gt.idle_work))
;
/* Assert that we sucessfully flushed all the work and
* reset the GPU back to its idle, low power state.
*/
WARN_ON(dev_priv->gt.awake);
WARN_ON(!intel_engines_are_idle(dev_priv));
The idle work waits for idle engines and sets gt.awake=false. Then before
engines can be awoken, gt.awake=true. So we either have a race despite being in
a single threaded suspend context, or... I have no idea.
bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
if (READ_ONCE(dev_priv->gt.active_requests))
return false;
/* If the driver is wedged, HW state may be very inconsistent and
* report that it is still busy, even though we have stopped using it.
*/
if (i915_terminally_wedged(&dev_priv->gpu_error))
return true;
for_each_engine(engine, dev_priv, id) {
if (!intel_engine_is_idle(engine))
return false;
}
return true;
}
bool intel_engine_is_idle(struct intel_engine_cs *engine)
{
struct drm_i915_private *dev_priv = engine->i915;
/* More white lies, if wedged, hw state is inconsistent */
if (i915_terminally_wedged(&dev_priv->gpu_error))
return true;
/* Any inflight/incomplete requests? */
if (!i915_seqno_passed(intel_engine_get_seqno(engine),
intel_engine_last_submit(engine)))
return false;
if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock))
return true;
/* Interrupt/tasklet pending? */
if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
return false;
/* Both ports drained, no more ELSP submission? */
if (port_request(&engine->execlist_port[0]))
return false;
/* ELSP is empty, but there are ready requests? */
if (READ_ONCE(engine->execlist_first))
return false;
/* Ring stopped? */
if (!ring_is_idle(engine))
return false;
return true;
}
It might be possible for an interrupt to kick in and dirty irq_posted, a very
late active->idle notification. Or the ring_is_idle() check on RING_MODE may be
garbage.
I'm going to go back and play the waiting game. Note for future self, consider
adding a WARN_ON(test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>