[Bug 101891] [BAT][BDW] WARN_ON(!intel_engines_are_idle(dev_priv)) in i915_gem_suspend+0x123/0x140
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Jul 28 16:18:34 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=101891
--- Comment #1 from Chris Wilson <chris at chris-wilson.co.uk> ---
It's just one of those impossible conditions that should never fire. The
sequence is this
/* As the idle_work is rearming if it detects a race, play safe and
* repeat the flush until it is definitely idle.
*/
while (flush_delayed_work(&dev_priv->gt.idle_work))
;
/* Assert that we sucessfully flushed all the work and
* reset the GPU back to its idle, low power state.
*/
WARN_ON(dev_priv->gt.awake);
WARN_ON(!intel_engines_are_idle(dev_priv));
The idle work waits for idle engines and sets gt.awake=false. Then before
engines can be awoken, gt.awake=true. So we either have a race despite being in
a single threaded suspend context, or... I have no idea.
bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
if (READ_ONCE(dev_priv->gt.active_requests))
return false;
/* If the driver is wedged, HW state may be very inconsistent and
* report that it is still busy, even though we have stopped using it.
*/
if (i915_terminally_wedged(&dev_priv->gpu_error))
return true;
for_each_engine(engine, dev_priv, id) {
if (!intel_engine_is_idle(engine))
return false;
}
return true;
}
bool intel_engine_is_idle(struct intel_engine_cs *engine)
{
struct drm_i915_private *dev_priv = engine->i915;
/* More white lies, if wedged, hw state is inconsistent */
if (i915_terminally_wedged(&dev_priv->gpu_error))
return true;
/* Any inflight/incomplete requests? */
if (!i915_seqno_passed(intel_engine_get_seqno(engine),
intel_engine_last_submit(engine)))
return false;
if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock))
return true;
/* Interrupt/tasklet pending? */
if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
return false;
/* Both ports drained, no more ELSP submission? */
if (port_request(&engine->execlist_port[0]))
return false;
/* ELSP is empty, but there are ready requests? */
if (READ_ONCE(engine->execlist_first))
return false;
/* Ring stopped? */
if (!ring_is_idle(engine))
return false;
return true;
}
It might be possible for an interrupt to kick in and dirty irq_posted, a very
late active->idle notification. Or the ring_is_idle() check on RING_MODE may be
garbage.
I'm going to go back and play the waiting game. Note for future self, consider
adding a WARN_ON(test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170728/12746f74/attachment-0001.html>
More information about the intel-gfx-bugs
mailing list