[Intel-gfx] [PATCH 1/1] drm/i915: Reset request handling for gen9+

Tue Jun 16 13:15:56 PDT 2015

On 16/06/2015 18:10, Chris Wilson wrote:
> On Tue, Jun 16, 2015 at 04:39:23PM +0300, Mika Kuoppala wrote:
>> In order for skl+ hardware to guarantee that no context switch
>> takes place during reset and that current context is properly
>> saved, the driver needs to notify and query hw before commencing
>> with reset.
>>
>> We will only proceed with reset if all engines report that they
>> are ready for reset.
>>
>> As we skip the reset if any single engine reports not ready, this
>> commit prevents system hang skl in some situations where the
>> gpu/blitter is hanged and in such state that any write to generic
>
> s/is hanged/is wedged/ reads better
>
>> reset register (GEN6_GDRST) causes immediate system hang.
>>
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=90854
>> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reg.h     |  3 +++
>>   drivers/gpu/drm/i915/intel_uncore.c | 32 +++++++++++++++++++++++++++++++-
>>   2 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index 0b979ad..3684f92 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -1461,6 +1461,9 @@ enum skl_disp_power_wells {
>>   #define RING_MAX_IDLE(base)	((base)+0x54)
>>   #define RING_HWS_PGA(base)	((base)+0x80)
>>   #define RING_HWS_PGA_GEN6(base)	((base)+0x2080)
>> +#define RING_RESET_CTL(base)	((base)+0xd0)
>> +#define   RESET_CTL_REQUEST_RESET  (1 << 0)
>> +#define   RESET_CTL_READY_TO_RESET (1 << 1)
>>
>>   #define HSW_GTT_CACHE_EN	0x4024
>>   #define   GTT_CACHE_EN_ALL	0xF0007FFF
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index 4a86cf0..404bce2 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1455,9 +1455,39 @@ static int gen6_do_reset(struct drm_device *dev)
>>   	return ret;
>>   }
>>
>> +static int wait_for_bits_set(struct drm_i915_private *dev_priv,
>> +			     const u32 reg, const u32 mask, const int timeout)
>> +{
>> +	return wait_for((I915_READ(reg) & mask) == mask, timeout);
>> +}
>> +
>> +static int gen9_do_reset(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	struct intel_engine_cs *engine;
>> +	int ret, i;
>> +
>> +	for_each_ring(engine, dev_priv, i) {
>> +		I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>> +			   _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
>> +
>> +		ret = wait_for_bits_set(dev_priv,
>> +					RING_RESET_CTL(engine->mmio_base),
>> +					RESET_CTL_READY_TO_RESET, 700);
>> +		if (ret) {
>> +			DRM_ERROR("%s: reset request timeout\n", engine->name);
>> +			return -ENODEV;
>
> return -EIO; since the reset didn't happen due to hardware issues
> (ENODEV is that we don't have the implementation for the GPU rather than
> it failed).
>
> Do we need any recovery? Do you guarrantee that the GPU reset resets the
> CTL register?
> -Chris

According to the bspec (if I remember correctly from the last time I had 
to deal with it - Mika, correct me if I'm way off here):

If the reset request succeeds the reset request bit is cleared and 
ready_to_reset is set. Following the engine reset both ready_to_reset 
and reset request bits are set to 0. If the reset request fails the 
reset_request bit is obviously still set.

Then again, all of this is assuming engine resets rather than a full GPU 
reset. The bspec does not say anything about what the effect of a full 
gpu reset is on the reset control registers. It's always seemed to me 
like the reset control register is only relevant when doing a per-engine 
reset rather than a full GPU reset but I might very well be wrong about 
that, especially since you guys have seen problems when not involving 
this reset handshake before doing full GPU resets.

Thanks,
Tomas

>