[Intel-gfx] [PATCH] drm/i915: Fix waiting for engines to idle

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue May 23 11:14:10 UTC 2017


On 23/05/2017 12:07, Chris Wilson wrote:
> On Tue, May 23, 2017 at 11:51:46AM +0100, Tvrtko Ursulin wrote:
>>
>> On 23/05/2017 11:30, Chris Wilson wrote:
>>> On Tue, May 23, 2017 at 11:09:17AM +0100, Chris Wilson wrote:
>>>> On Tue, May 23, 2017 at 10:36:34AM +0100, Chris Wilson wrote:
>>>>> On Tue, May 23, 2017 at 10:19:31AM +0100, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>
>>>>>> Waiting for engines needs to happen even in the non-debug builds
>>>>>> so it is incorrect to wrap it in a GEM_WARN_ON.
>>>>>>
>>>>>> Call it unconditionally and add GEM_WARN so that the debug
>>>>>> warning can still be emitted when things go bad.
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>> Fixes: 25112b64b3d2 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
>>>>>
>>>>> I would also say
>>>>> Fixes: 7a453fb82f86 ("drm/i915: Remove redudant wait for each engine to idle from seqno wrap")
>>>>> as that's the patch that assumes wait_for_idle included the wait on
>>>>> engines.
>>>>
>>>> Hmm, actually that was only to prevent a DEBUG_GEM failure so not a
>>>> relevant citation for fixes?
>>>
>>> And actually, this if for debug only code so the Fixes 25112b64b3d2 is
>>> wrong as well as we as introducing a change in behaviour (making the
>>> debug only code run all the time) with no urgent need for backporting.
>>
>> Which part is for debug code only? Without it i915_gem_wait_for_idle
>> is left with no checking for irq idleness, but it only seqno based.
>> If that is what you say is debug only I think we need to mark it
>> better in the code.
>
> The code for waiting on engines was added purely to solve an issue
> created by GEM_BUG_ONs presuming a strict order between context-switch,
> retirement and seqno update on wrap. It is not observable, i.e. has no
> effect, outside of DRM_I915_DEBUG_GEM. This patch is introducing an
> observable effect for production systems by declaring the machine wedged,
> which is far more severe than the normal course of events to first try a
> reset. I'm suggesting that we should be in no hurry to call set-wedged
> here without first allowing it to percolate through linux-next.

But that means broken GEM_WARN_ON semantics were actually correct in 
this case, even if non-obviously so. Which is why I said which should 
mark it better in the code. And by that I mean the call site of 
wait_engines in i915_gem_wait_for_idle should be #ifdef GEM_DEBUG (I 
can't suggest GEM_WARN_ON!).

Regards,

Tvrtko


More information about the Intel-gfx mailing list