[Intel-gfx] [PATCH 01/40] drm/i915/hangcheck: Replace hangcheck.seqno with RING_HEAD

Chris Wilson chris at chris-wilson.co.uk
Wed May 8 14:10:23 UTC 2019


Quoting Mika Kuoppala (2019-05-08 15:00:11)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2019-05-08 13:30:46)
> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >> 
> >> > After realising we need to sample RING_START to detect context switches
> >> > from preemption events that do not allow for the seqno to advance, we
> >> > can also realise that the seqno itself is just a distance along the ring
> >> > and so can be replaced by sampling RING_HEAD.
> >> >
> >> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >> > ---
> >> >  static enum intel_engine_hangcheck_action
> >> > @@ -156,7 +156,7 @@ hangcheck_get_action(struct intel_engine_cs *engine,
> >> >       if (engine->hangcheck.last_ring != hc->ring)
> >> >               return ENGINE_ACTIVE_SEQNO;
> >> >  
> >> > -     if (engine->hangcheck.last_seqno != hc->seqno)
> >> > +     if (engine->hangcheck.last_head != hc->head)
> >> >               return ENGINE_ACTIVE_SEQNO;
> >> 
> >> Change the enum also?
> >
> > Pffifle. As far as hangcheck goes RING_START:RING_HEAD comprise its
> > seqno.
> >
> > Makes for a good talking point in a few years time :)
> 
> Fair enough.
> 
> >
> >> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > index d1a54d2c3d5d..f1d62746e066 100644
> >> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > @@ -2275,12 +2275,6 @@ static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
> >> >                                 request->timeline->hwsp_offset,
> >> >                                 0);
> >> >  
> >> > -     cs = gen8_emit_ggtt_write(cs,
> >> > -                               intel_engine_next_hangcheck_seqno(request->engine),
> >> > -                               I915_GEM_HWS_HANGCHECK_ADDR,
> >> > -                               MI_FLUSH_DW_STORE_INDEX);
> >> > -
> >> > -
> >> >       *cs++ = MI_USER_INTERRUPT;
> >> >       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >> >  
> >> > @@ -2297,14 +2291,11 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
> >> >                                     request->timeline->hwsp_offset,
> >> >                                     PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> >> >                                     PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> >> > -                                   PIPE_CONTROL_DC_FLUSH_ENABLE |
> >> > -                                   PIPE_CONTROL_FLUSH_ENABLE |
> >> > -                                   PIPE_CONTROL_CS_STALL);
> >> 
> >> ???
> >
> > Kabylake sends the interrupt too early otherwise. The hangcheck write
> > saved us by pure accident.
> 
> I read the diff wrong at first try also, was concerned that we lost cs stall.
> Regardless this could benefit from a comment explaining the need to
> force sync for the intr.

Indeed, that is sensible since it's an empirical result and worth
validating again later. Done and pushed, thanks.
-Chris


More information about the Intel-gfx mailing list