[Intel-gfx] [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock

Michel Thierry michel.thierry at intel.com
Thu Aug 3 19:31:44 UTC 2017


On 7/27/2017 5:13 AM, Tvrtko Ursulin wrote:
> 
> On 20/07/2017 21:23, Daniel Vetter wrote:
>> ... using the biggest hammer we have. This is essentially a weaponized
>> version of the timeout-based wedging Chris added in
>>
>> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
>> Author: Chris Wilson <chris at chris-wilson.co.uk>
>> Date:   Thu Jun 22 11:56:25 2017 +0100
>>
>>       drm/i915: Break modeset deadlocks on reset
>>
>> Because defense-in-depth is good it's good to still have both. Also
>> note that with the locking change we can now restrict this a lot (old
>> gpus and special testing only), so this doesn't kill the TDR benefits
>> on at least anything remotely modern.
>>
>> And futuremore with a few tricks it should be possible to make a much
>> more educated guess about whether an atomic commit is stuck waiting on
>> the gpu (atomic_t counting the pending i915_sw_fence used by the
>> atomic modeset code should do it), so we can improve this.
>>
>> But for now just start with something that is guaranteed to recover
>> faster, for much better CI througput.
>>
>> This defacto reverts TDR on these platforms, but there's not really a
>> single commit to specify as the sole offender.
>>
>> v2: Add a debug message to explain what's going on. We can't DRM_ERROR
>> because that spams CI. And the timeout based fallback still prints a
>> DRM_ERROR, in case something goes wrong.
>>
>> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
>> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
>> ---
>>    drivers/gpu/drm/i915/intel_display.c | 6 ++++++
>>    1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 02b1f4966049..995522e40ec1 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>>            !gpu_reset_clobbers_display(dev_priv))
>>                return;
>>
>> +     /* We have a modeset vs reset deadlock, defensively unbreak it.
>> +      *
>> +      * FIXME: We can do a _lot_ better, this is just a first iteration.*/

I think the standard is to move '*/' to a new line.

>> +     i915_gem_set_wedged(dev_priv);
>> +     DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
>> +
>>        /*
>>         * Need mode_config.mutex so that we don't
>>         * trample ongoing ->detect() and whatnot.
>>
> 
> Looks fine to me since it only affects very old platforms, and is a
> temporary measure.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 

Also,

Reviewed-by: Michel Thierry <michel.thierry at intel.com>


More information about the Intel-gfx mailing list