[Intel-gfx] [PATCH] drm/i915: Skip an engine reset if it recovered before our preparations

Michel Thierry michel.thierry at intel.com
Sat Dec 16 00:20:56 UTC 2017


On 12/15/2017 4:16 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2017-12-16 00:02:47)
>> Hi,
>>
>> On 12/15/2017 3:52 PM, Chris Wilson wrote:
>>> At the beginning of a reset, we disable the submission method and find
>>> the stuck request. We expect to find a stuck request for we have
>>> declared the engine stalled. However, if we find no active request, the
>>> engine must have recovered from its stall before we could issue a reset,
>>> so let the engine continue on without a reset. If the engine is truly
>>> stuck, we will back soon enough with the next reset attempt.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_drv.c | 14 +++++++-------
>>>    1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>>> index ca9f4b2862eb..6f24435ddffe 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>> @@ -2011,19 +2011,19 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
>>>    
>>>        GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
>>>    
>>> -     if (!(flags & I915_RESET_QUIET)) {
>>> -             dev_notice(engine->i915->drm.dev,
>>> -                        "Resetting %s after gpu hang\n", engine->name);
>>> -     }
>>> -     error->reset_engine_count[engine->id]++;
>>> -
>>>        active_request = i915_gem_reset_prepare_engine(engine);
>>> -     if (IS_ERR(active_request)) {
>>> +     if (IS_ERR_OR_NULL(active_request)) {
>>>                DRM_DEBUG_DRIVER("Previous reset failed, promote to full reset\n");
>>>                ret = PTR_ERR(active_request);
>>
>> Will a static checker complain about PTR_ERR(NULL)?
> 
> It shouldn't. PTR_ERR(NULL) -> 0 is one of the valid tricks of PTR_ERR.
> 
>> And the DRM_DEBUG_DRIVER isn't also correct in that case.
> 
> Bah, I was betting on those who read this would know that the full chip
> reset was pardoned. If you want, we can just remove the debug.

Yes, the problem is sometimes we only get logs without knowing the code. 
I would vote to either remove it or change it to just say 'reset skipped'.

-Michel


More information about the Intel-gfx mailing list