[PATCH 3/9] drm/i915: Avoid the gpu reset vs. modeset deadlock

Daniel Vetter daniel.vetter at ffwll.ch
Wed Jul 19 18:44:43 UTC 2017


On Wed, Jul 19, 2017 at 3:44 PM, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> On Wed, Jul 19, 2017 at 3:32 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>> Quoting Daniel Vetter (2017-07-19 13:54:56)
>>> ... using the biggest hammer we have. This is essentially a weaponized
>>> version of the timeout-based wedging Chris added in
>>>
>>> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
>>> Author: Chris Wilson <chris at chris-wilson.co.uk>
>>> Date:   Thu Jun 22 11:56:25 2017 +0100
>>>
>>>     drm/i915: Break modeset deadlocks on reset
>>>
>>> Because defense-in-depth is good it's good to still have both. Also
>>> note that with the locking change we can now restrict this a lot (old
>>> gpus and special testing only), so this doesn't kill the TDR benefits
>>> on at least anything remotely modern.
>>>
>>> And futuremore with a few tricks it should be possible to make a much
>>> more educated guess about whether an atomic commit is stuck waiting on
>>> the gpu (atomic_t counting the pending i915_sw_fence used by the
>>> atomic modeset code should do it), so we can improve this.
>>>
>>> But for now just start with something that is guaranteed to recover
>>> faster, for much better CI througput.
>>>
>>> This defacto reverts TDR on these platforms, but there's not really a
>>> single commit to specify as the sole offender.
>>>
>>> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
>>> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/intel_display.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>>> index 97777ffa1566..010a1f3e000c 100644
>>> --- a/drivers/gpu/drm/i915/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/intel_display.c
>>> @@ -3471,6 +3471,11 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>>>             !gpu_reset_clobbers_display(dev_priv))
>>>                 return;
>>>
>>> +       /* We have a modeset vs reset deadlock, defensively unbreak it.
>>> +        *
>>> +        * FIXME: We can do a _lot_ better, this is just a first iteration.*/
>>
>> You should keep the error message for wedging the device. If all goes
>> well it is removed in a few patches, so shouldn't be an eyesore for
>> long.
>
> Yeah makes sense, just in case the next patches need to be reverted
> for some reasons. That's why I split it ou. Something like
>
> DRM_ERROR("Wedging gpu to unblock modesets\n");
>
> and then remove that again 2 patches later?

After thinking about this a bit more, s/DRM_ERROR/DRM_DEBUG_KMS/ or
so. We know we can deadlock here, complaining about that only spams
dmesg and results in noise in CI, hiding worse stuff. The timeout
based wedging as fallback should have a DRM_ERROR since it's
unexpected, this one here imo sholdn't. And with the follow-up patches
it won't be unconditional anymore (if we don't have to revert them
again).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list