[Intel-gfx] [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset
Mika Kuoppala
mika.kuoppala at linux.intel.com
Thu May 18 12:51:19 UTC 2017
Chris Wilson <chris at chris-wilson.co.uk> writes:
> On Thu, May 18, 2017 at 03:34:22PM +0300, Mika Kuoppala wrote:
>> ELK seems to very picky about the preconditions to reset.
>> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
>> not like if reset occurs when there is active ring.
>>
>> Ville found out that there is workaround with name
>> 'WaMediaResetMainRingCleanup' which suggests that we need to
>> cleanup rings before resetting. It is unclear what cleanup
>> exactly means but evidence shows that stopping the ring
>> does have an effect on reset reliability.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998
>> Suggested-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Tomi Sarvela <tomi.p.sarvela at intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++
>> 1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index 7eaaf2225e1a..1d473cd1f8a4 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>> return ret;
>> }
>>
>> +static void gen3_stop_rings(struct drm_i915_private *dev_priv)
>> +{
>> + struct intel_engine_cs *engine;
>> + enum intel_engine_id id;
>> +
>> + for_each_engine(engine, dev_priv, id) {
>> + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base);
>> +
>> + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING));
>
> This will only stop between instructions on the CS (i.e. within the
> ring) aiui . If we did hang in a shader, this is not going to achieve
> very much.
The subject is too grandiose then. Should I add to the subject 'basic
hangs'?
This seems to help get our hang tests through, bringing more coverage.
I pondered adding ctl, head and tail writes to zero. It would not help
with shades, but should not hurt either.
>
>> + if (intel_wait_for_register_fw(dev_priv,
>> + mode_r,
>> + MODE_IDLE,
>> + MODE_IDLE,
>> + 1000)) {
>> + DRM_DEBUG("%s : timed out STOP_RING\n",
>> + engine->name);
>
> If we make this a DRM_ERROR I expect it to fire through the hang tests
> in BAT.
Why would stopping the ring be a problem with a clean chained batch?
Well I try to see how it goes.
-Mika
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list