[Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit before restoring the context
Mika Kuoppala
mika.kuoppala at linux.intel.com
Tue Aug 14 13:42:41 UTC 2018
Chris Wilson <chris at chris-wilson.co.uk> writes:
> Before a reset, we set the STOP_RING bit of RING_MI_MODE to freeze the
> engine. However, sometimes we observe that upon restart, the engine
> freezes again with the STOP_RING bit still asserted. By inspection, we
> know that the register itself is cleared by the GPU reset, so that bit
> must be preserved inside the context image and reloaded from there. A
> simple fix (as the RING_MI_MODE is at a fixed offset in a valid context)
> is to clobber the STOP_RING bit inside the image before replaying the
> request.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: MichaĆ Winiarski <michal.winiarski at intel.com>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++--
> drivers/gpu/drm/i915/intel_lrc_reg.h | 2 ++
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3f90c74038ef..37fe842de639 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1918,6 +1918,20 @@ static void execlists_reset(struct intel_engine_cs *engine,
>
> spin_unlock_irqrestore(&engine->timeline.lock, flags);
>
> + if (!request)
> + return;
> +
> + regs = request->hw_context->lrc_reg_state;
> +
> + /*
> + * After a reset, the context may have preserved the STOP bit
> + * of RING_MI_MODE we used to freeze the active engine before
> + * the reset. If that bit is restored the ring stops instead
> + * of being executed.
> + */
> + regs[CTX_MI_MODE + 1] |= STOP_RING << 16;
> + regs[CTX_MI_MODE + 1] &= ~STOP_RING;
Ok, I did go and pondered if this is truely the simplest
way. Forcing the start outside a context would not
be simpler and would be slower.
Nice find and it will has potential to explain our some troubles
of re-enabling engines.
Did you find out this by looking at the error states or what?
Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> +
> /*
> * If the request was innocent, we leave the request in the ELSP
> * and will try to replay it on restarting. The context image may
> @@ -1929,7 +1943,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
> * and have to at least restore the RING register in the context
> * image back to the expected values to skip over the guilty request.
> */
> - if (!request || request->fence.error != -EIO)
> + if (request->fence.error != -EIO)
> return;
>
> /*
> @@ -1940,7 +1954,6 @@ static void execlists_reset(struct intel_engine_cs *engine,
> * future request will be after userspace has had the opportunity
> * to recreate its own state.
> */
> - regs = request->hw_context->lrc_reg_state;
> if (engine->pinned_default_state) {
> memcpy(regs, /* skip restoring the vanilla PPHWSP */
> engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
> diff --git a/drivers/gpu/drm/i915/intel_lrc_reg.h b/drivers/gpu/drm/i915/intel_lrc_reg.h
> index 5ef932d810a7..3b155ecbfa92 100644
> --- a/drivers/gpu/drm/i915/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/intel_lrc_reg.h
> @@ -39,6 +39,8 @@
> #define CTX_R_PWR_CLK_STATE 0x42
> #define CTX_END 0x44
>
> +#define CTX_MI_MODE 0x54
> +
> #define CTX_REG(reg_state, pos, reg, val) do { \
> u32 *reg_state__ = (reg_state); \
> const u32 pos__ = (pos); \
> --
> 2.18.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list