[Intel-gfx] [PATCH] drm/i915: Allow unready gpu to be reset on gen8

Mika Kuoppala mika.kuoppala at linux.intel.com
Fri Oct 30 08:18:18 PDT 2015


Chris Wilson <chris at chris-wilson.co.uk> writes:

> On Fri, Oct 30, 2015 at 04:43:49PM +0200, Mika Kuoppala wrote:
>> Gen9 has had demonstrated cases where forcing a not ready gpu
>> into reset has caused system hang [1].
>> 
>> Gen8 has never to this date demonstrated such behaviour.
>> 
>> In our CI tests bsw sometimes ends up in a state where it claims it
>> is not ready for reset, based on reset request, after gpu hang.
>> 
>> Allow gen8 to reset even after claims of nonreadiness in order
>> to keep the gpu accessible. Enhance logging so that it will be
>> clear what conditions led to decision of proceeding or bailing out,
>> so that we will spot if this way of forcing our will against gpu turns
>> out to be foolhardy.
>> 
>> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> Cc: Tomi Sarvela <tomix.p.sarvela at intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index f0f97b2..47c17f2 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1504,7 +1504,14 @@ not_ready:
>>  		I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>>  			   _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>>  
>> -	return -EIO;
>
> Where's the reference for where we hit this EIO on gen8?
>

Internal CI logs, relevant part cutpasted below. If you want
full log holler me in irc.

[  119.147727] kms_pipe_crc_basic: starting subtest hang-read-crc-pipe-A
[  124.785063] [drm] stuck on render ring
[  124.800850] [drm] GPU HANG: ecode 8:0:0xfffffffe, in kms_pipe_crc_ba
[5590], reason: Ring hung, action: reset
[  124.801154] [drm] GPU hangs can indicate a bug anywhere in the entire
gfx stack, including userspace.
[  124.801161] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[  124.801167] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[  124.801173] [drm] The gpu crash dump is required to analyze gpu
hangs, so please always attach it.
[  124.801179] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  124.801785] kobject: 'card0' (ffff880174ad92a0): kobject_uevent_env
[  124.801940] kobject: 'card0' (ffff880174ad92a0): fill_kobj_path: path
= '/devices/pci0000:00/0000:00:02.0/drm/card0'
[  124.805032] kobject: 'card0' (ffff880174ad92a0): kobject_uevent_env
[  124.805089] kobject: 'card0' (ffff880174ad92a0): fill_kobj_path: path
= '/devices/pci0000:00/0000:00:02.0/drm/card0'
[  125.511744] [drm:gen8_do_reset [i915]] *ERROR* render ring: reset
request timeout
[  125.511922] [drm] Simulated gpu hang, resetting stop_rings
[  125.511927] drm/i915: Resetting chip after gpu hang
[  125.511954] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5
[  125.637612] kms_pipe_crc_basic: exiting, ret=0
[  125.653608] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5
[  125.847695] gem_ctx_param_basic: executing
[  125.850086] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5
[  125.854482] gem_ctx_param_basic: exiting, ret=99
[  126.038693] kms_addfb_basic: executing
[  126.041754] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5

-Mika

>> +	if (INTEL_INFO(dev)->gen == 9) {
>> +		DRM_ERROR("Reset would risk system stability, bailing out\n");
>> +		return -EIO;
>> +	}
>> +
>> +	DRM_ERROR("Forcing non ready gpu into reset\n");
>> +
>> +	return gen6_do_reset(dev);
>>  }
>>  
>>  static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>> -- 
>> 2.5.0
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list