[Intel-gfx] [PATCH] drm/i915: Fix waiting for engines to idle

Chris Wilson chris at chris-wilson.co.uk
Tue May 23 11:07:02 UTC 2017


On Tue, May 23, 2017 at 11:51:46AM +0100, Tvrtko Ursulin wrote:
> 
> On 23/05/2017 11:30, Chris Wilson wrote:
> >On Tue, May 23, 2017 at 11:09:17AM +0100, Chris Wilson wrote:
> >>On Tue, May 23, 2017 at 10:36:34AM +0100, Chris Wilson wrote:
> >>>On Tue, May 23, 2017 at 10:19:31AM +0100, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>
> >>>>Waiting for engines needs to happen even in the non-debug builds
> >>>>so it is incorrect to wrap it in a GEM_WARN_ON.
> >>>>
> >>>>Call it unconditionally and add GEM_WARN so that the debug
> >>>>warning can still be emitted when things go bad.
> >>>>
> >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>Fixes: 25112b64b3d2 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
> >>>
> >>>I would also say
> >>>Fixes: 7a453fb82f86 ("drm/i915: Remove redudant wait for each engine to idle from seqno wrap")
> >>>as that's the patch that assumes wait_for_idle included the wait on
> >>>engines.
> >>
> >>Hmm, actually that was only to prevent a DEBUG_GEM failure so not a
> >>relevant citation for fixes?
> >
> >And actually, this if for debug only code so the Fixes 25112b64b3d2 is
> >wrong as well as we as introducing a change in behaviour (making the
> >debug only code run all the time) with no urgent need for backporting.
> 
> Which part is for debug code only? Without it i915_gem_wait_for_idle
> is left with no checking for irq idleness, but it only seqno based.
> If that is what you say is debug only I think we need to mark it
> better in the code.

The code for waiting on engines was added purely to solve an issue
created by GEM_BUG_ONs presuming a strict order between context-switch,
retirement and seqno update on wrap. It is not observable, i.e. has no
effect, outside of DRM_I915_DEBUG_GEM. This patch is introducing an
observable effect for production systems by declaring the machine wedged,
which is far more severe than the normal course of events to first try a
reset. I'm suggesting that we should be in no hurry to call set-wedged
here without first allowing it to percolate through linux-next.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list