[Intel-gfx] [PATCH] drm/i915: Cancel pending execlists irq handler upon idling

Chris Wilson chris at chris-wilson.co.uk
Wed Jun 28 10:29:58 UTC 2017


Quoting Tvrtko Ursulin (2017-06-28 11:15:36)
> 
> On 28/06/2017 11:01, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-06-28 09:59:04)
> >>
> >> On 27/06/2017 16:25, Chris Wilson wrote:
> >>> Due to the slight asynchronicity in handling the execlists interrupts
> >>> (i.e. we defer the work to a handler that may consume more than one
> >>> interrupt event), when the engine is idle we may still have an irq
> >>> tasklet queued (especially when it has been deferred to a ksoftirqd).
> >>> At the beginning of the tasklet, we assert that we do hold a device
> >>> wakeref for the access we are about to perform. This assumes that when
> >>> we idle and release the GT wakeref, all execlists work has been
> >>> completed (since the elsp tracking says the hw is idle). However, there
> >>> may still be a tasklet queued, so as we mark the engine idle, also
> >>> cancel any pending tasklet.
> >>
> >> We check the irq_posted bit which should correspond with a pending
> >> tasklet (intel_engines_are_idle/intel_engine_is_idle), before
> >> transitioning to idle so I don't understand this.
> > 
> > Exactly, we've processed the interrupt in the current irq handler, but
> > due to the ordering (which is essential to ensure that we don't miss an
> > interrupt, i.e. the strong ordering is via the tasklet atomic ops so
> > that each interrupt is always followed by a tasklet, at least if we do
> > have elsp[]!) we can queue a second tasklet execution despite it already
> > being handled concurrently.
> > 
> > Run long enough and this rare event will then coincide with idling.
> 
> Ah rings a bell now. That would mean the irq_posted bit being consumed 
> by the current tasklet, and then the next one coming along as designed.

Phew, I didn't have to draw the ascii flow chart. Now, for the final
execution that triggers the bug, we shouldn't actually touch hw (the irq
posted bit should be clear, and the execlists queue should also be
empty), so we could argue that the bug on is in err -- but I think it is
a useful bug on to document the intent of the wakeref being held on our
behalf and cancelling the extra work seems sensible.
 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Ta, consider it pushed. Not sure if CI has seen it yet, so no bugzilla
to close.
-Chris


More information about the Intel-gfx mailing list