[Intel-gfx] [PATCH 2/2] drm/i915/gt: Only unwedge if we can reset first

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Tue Sep 10 21:20:06 UTC 2019



On 9/9/19 11:06 PM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-09-10 01:59:38)
>>
>>
>> On 9/9/19 3:55 PM, Chris Wilson wrote:
>>> Unwedging the GPU requires a successful GPU reset before we restore the
>>> default submission, or else we may see residual context switch events
>>> that we were not expecting.
>>>
>>> Reported-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++-
>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> index fe57296b790c..5242496a893a 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
>>>        struct intel_gt_timelines *timelines = &gt->timelines;
>>>        struct intel_timeline *tl;
>>>        unsigned long flags;
>>> +     bool ok;
>>>    
>>>        if (!test_bit(I915_WEDGED, &gt->reset.flags))
>>>                return true;
>>> @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
>>>        }
>>>        spin_unlock_irqrestore(&timelines->lock, flags);
>>>    
>>> -     intel_gt_sanitize(gt, false);
>>> +     ok = false;
>>> +     if (!reset_clobbers_display(gt->i915))
>>> +             ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
>>
>> Of the thing we had in the gt_sanitize, we're ok skipping the
>> uc_sanitize() because we take care of that during wedge (from
>> intel_uc_reset_prepare), but what about the loop of
>> __intel_engine_reset()? Is that safe to skip here?
> 
> I think yes, because we always follow the unwedge with a GT restart. That
> is either via the full reset or the sanitize+restart on resume. Both call
> paths will also set the wedged bit if they fail. gem_eio/suspend should
> be testing the recovery upon resume path, and even gem_eio/*-stress
> should give responsible coverage of the normal recovery via full reset.
>   
>> Apart from that, the patch LGTM. Worth noting that with this change a
>> successful reset is required to unwedge even after a suspend/resume
>> cycle (in gem_sanitize), which is a good thing IMO.
> 
> Hence why relaxing the gpu_clobbers_display is important to retain the
> ability to clear wedged across suspend on older devices.
> -Chris
> 

Sold!

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

Daniele


More information about the Intel-gfx mailing list