[Intel-gfx] [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts

Chris Wilson chris at chris-wilson.co.uk
Wed Mar 28 16:36:09 UTC 2018


Quoting Tvrtko Ursulin (2018-03-28 17:26:37)
> 
> On 27/03/2018 22:01, Chris Wilson wrote:
> > Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> > tests, he wedged the driver by injecting an unrecoverable hang whilst a
> > batch was spinning. As we reset the gpu in the middle of the spinner,
> > when resumed it would continue on from the next instruction in the ring
> > and write it's breadcrumb. However, on wedging we updated our
> > bookkeeping to indicate that the GPU had completed executing and would
> > restart from after the breadcrumb; so the emission of the stale
> > breadcrumb from before the reset came as a bit of a surprise.
> > 
> > A simple fix is to when rebinding the context into the GPU, we update
> > the ring register state in the context image to match our bookkeeping.
> > We already have to update the RING_START and RING_TAIL, so updating
> > RING_HEAD as well is trivial. This works because whenever we unbind the
> > context, we keep the bookkeeping in check; and on wedging we unbind all
> > contexts.
> > 
> > Testcase: igt/gem_eio
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index ba7f7831f934..654634254b64 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
> >       ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
> >       ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
> >               i915_ggtt_offset(ce->ring->vma);
> > +     ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
> >   
> >       ce->state->obj->pin_global++;
> >       i915_gem_context_get(ctx);
> > 
> 
> After quite some amount of walking trough the code, looking at traces 
> and chatting on IRC:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

I feel like it is one of those that is going to be asked about in 6
months time and I'll have to admit the shameful secret. Smoke and
mirrors, smoke and mirrors.
-Chris


More information about the Intel-gfx mailing list