[PATCH 3/9] drm/i915: Avoid the gpu reset vs. modeset deadlock

Daniel Vetter daniel.vetter at ffwll.ch
Wed Jul 19 13:44:14 UTC 2017


On Wed, Jul 19, 2017 at 3:32 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Quoting Daniel Vetter (2017-07-19 13:54:56)
>> ... using the biggest hammer we have. This is essentially a weaponized
>> version of the timeout-based wedging Chris added in
>>
>> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
>> Author: Chris Wilson <chris at chris-wilson.co.uk>
>> Date:   Thu Jun 22 11:56:25 2017 +0100
>>
>>     drm/i915: Break modeset deadlocks on reset
>>
>> Because defense-in-depth is good it's good to still have both. Also
>> note that with the locking change we can now restrict this a lot (old
>> gpus and special testing only), so this doesn't kill the TDR benefits
>> on at least anything remotely modern.
>>
>> And futuremore with a few tricks it should be possible to make a much
>> more educated guess about whether an atomic commit is stuck waiting on
>> the gpu (atomic_t counting the pending i915_sw_fence used by the
>> atomic modeset code should do it), so we can improve this.
>>
>> But for now just start with something that is guaranteed to recover
>> faster, for much better CI througput.
>>
>> This defacto reverts TDR on these platforms, but there's not really a
>> single commit to specify as the sole offender.
>>
>> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
>> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_display.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 97777ffa1566..010a1f3e000c 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -3471,6 +3471,11 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>>             !gpu_reset_clobbers_display(dev_priv))
>>                 return;
>>
>> +       /* We have a modeset vs reset deadlock, defensively unbreak it.
>> +        *
>> +        * FIXME: We can do a _lot_ better, this is just a first iteration.*/
>
> You should keep the error message for wedging the device. If all goes
> well it is removed in a few patches, so shouldn't be an eyesore for
> long.

Yeah makes sense, just in case the next patches need to be reverted
for some reasons. That's why I split it ou. Something like

DRM_ERROR("Wedging gpu to unblock modesets\n");

and then remove that again 2 patches later?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list