<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - [BAT][BDW] WARN_ON(!intel_engines_are_idle(dev_priv)) in i915_gem_suspend+0x123/0x140" href="https://bugs.freedesktop.org/show_bug.cgi?id=101891#c1">Comment # 1</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - [BAT][BDW] WARN_ON(!intel_engines_are_idle(dev_priv)) in i915_gem_suspend+0x123/0x140" href="https://bugs.freedesktop.org/show_bug.cgi?id=101891">bug 101891</a> from <a class="email" href="mailto:chris@chris-wilson.co.uk" title="Chris Wilson <chris@chris-wilson.co.uk>"> Chris Wilson</a> <pre>It's just one of those impossible conditions that should never fire. The sequence is this /* As the idle_work is rearming if it detects a race, play safe and * repeat the flush until it is definitely idle. */ while (flush_delayed_work(&dev_priv->gt.idle_work)) ; /* Assert that we sucessfully flushed all the work and * reset the GPU back to its idle, low power state. */ WARN_ON(dev_priv->gt.awake); WARN_ON(!intel_engines_are_idle(dev_priv)); The idle work waits for idle engines and sets gt.awake=false. Then before engines can be awoken, gt.awake=true. So we either have a race despite being in a single threaded suspend context, or... I have no idea. bool intel_engines_are_idle(struct drm_i915_private *dev_priv) { struct intel_engine_cs *engine; enum intel_engine_id id; if (READ_ONCE(dev_priv->gt.active_requests)) return false; /* If the driver is wedged, HW state may be very inconsistent and * report that it is still busy, even though we have stopped using it. */ if (i915_terminally_wedged(&dev_priv->gpu_error)) return true; for_each_engine(engine, dev_priv, id) { if (!intel_engine_is_idle(engine)) return false; } return true; } bool intel_engine_is_idle(struct intel_engine_cs *engine) { struct drm_i915_private *dev_priv = engine->i915; /* More white lies, if wedged, hw state is inconsistent */ if (i915_terminally_wedged(&dev_priv->gpu_error)) return true; /* Any inflight/incomplete requests? */ if (!i915_seqno_passed(intel_engine_get_seqno(engine), intel_engine_last_submit(engine))) return false; if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock)) return true; /* Interrupt/tasklet pending? */ if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) return false; /* Both ports drained, no more ELSP submission? */ if (port_request(&engine->execlist_port[0])) return false; /* ELSP is empty, but there are ready requests? */ if (READ_ONCE(engine->execlist_first)) return false; /* Ring stopped? */ if (!ring_is_idle(engine)) return false; return true; } It might be possible for an interrupt to kick in and dirty irq_posted, a very late active->idle notification. Or the ring_is_idle() check on RING_MODE may be garbage. I'm going to go back and play the waiting game. Note for future self, consider adding a WARN_ON(test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> <li>You are on the CC list for the bug.</li> <li>You are the assignee for the bug.</li> </ul> </body> </html>