[Intel-gfx] [PATCH] drm/i915/execlists: Verify context register state before execution

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 31 14:40:50 UTC 2019


Quoting Mika Kuoppala (2019-10-31 14:32:05)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Check that the context's ring register state still matches our
> > expectations prior to execution.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c     | 73 ++++++++++++++++++++-----
> >  drivers/gpu/drm/i915/gt/intel_lrc_reg.h |  4 +-
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c  |  4 +-
> >  3 files changed, 63 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 5f61cd128d9c..666e70931524 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1146,6 +1146,60 @@ execlists_schedule_out(struct i915_request *rq)
> >       i915_request_put(rq);
> >  }
> >  
> > +static int lrc_ring_mi_mode(const struct intel_engine_cs *engine)
> > +{
> > +     if (INTEL_GEN(engine->i915) >= 12)
> > +             return 0x60;
> > +     else if (INTEL_GEN(engine->i915) >= 9)
> > +             return 0x54;
> > +     else if (engine->class == RENDER_CLASS)
> > +             return 0x58;
> > +     else
> > +             return -1;
> > +}
> > +
> > +static void
> > +execlists_check_context(const struct intel_context *ce,
> > +                     const struct intel_engine_cs *engine)
> > +{
> > +     const struct intel_ring *ring = ce->ring;
> > +     u32 *regs = ce->lrc_reg_state;
> > +     int x;
> > +
> > +     if (regs[CTX_RING_START] != i915_ggtt_offset(ring->vma)) {
> > +             pr_err_once("%s: context submitted with incorrect RING_BUFFER_START [%08x], expected %08x\n",
> > +                         engine->name,
> > +                         regs[CTX_RING_START],
> > +                         i915_ggtt_offset(ring->vma));
> > +             regs[CTX_RING_START] = i915_ggtt_offset(ring->vma);
> > +     }
> > +
> > +     if ((regs[CTX_RING_CTL] & ~(RING_WAIT | RING_WAIT_SEMAPHORE)) !=
> > +         (RING_CTL_SIZE(ring->size) | RING_VALID)) {
> > +             pr_err_once("%s: context submitted with incorrect RING_BUFFER_CONTROL [%08x], expected %08x\n",
> > +                         engine->name,
> > +                         regs[CTX_RING_CTL],
> > +                         (u32)(RING_CTL_SIZE(ring->size) | RING_VALID));
> > +             regs[CTX_RING_CTL] = RING_CTL_SIZE(ring->size) | RING_VALID;
> 
> We are going to submit by clearing the waits. First I thought this will
> lead to disaster but the hardware works so that we clear on writing '1'.

Afaik, they are only indicator bits. So the HW ignores those bits when
we write to the register.

> > +     if (regs[CTX_BB_STATE] != RING_BB_PPGTT) {
> > +             pr_err_once("%s: context submitted with incorrect BB_STATE [%08x], expected %08x\n",
> > +                         engine->name,
> > +                         regs[CTX_BB_STATE],
> > +                         RING_BB_PPGTT);
> > +             regs[CTX_BB_STATE] = RING_BB_PPGTT;
> > +     }
> > +
> > +     x = lrc_ring_mi_mode(engine);
> > +     if (x != -1 && regs[x + 1] & STOP_RING) {
> > +             pr_err_once("%s: context submitted with STOP_RING [%08x] in RING_MI_MODE\n",
> > +                         engine->name, regs[x + 1]);
> > +             regs[x + 1] &= ~STOP_RING;
> > +             regs[x + 1] |= STOP_RING << 16;
> 
> Ok you want only to care about this one. Was pondering of restoring rest
> of the state from previous.

It was mostly in the spirit of "now that we are here, we might as well
fix it up". It's mainly the paranoia in trying to ascertain if we are
feeding garbage into the GPU. There's probably a lot more we can check,
but the ring registers are essential :)

> Will the hardware even care on masks at this point or is the saving part
> setting them accordingly?

Aiui, it uses the masks on the implicit LRI in the context restore to
control where or not to actually apply the bits. Not that I have checked
to see what the HW is doing (we will see if it ever flags an error), but
memory says from elsewere it uses 0xffffxxxx.
-Chris


More information about the Intel-gfx mailing list