[Intel-gfx] [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt

Thu Feb 16 11:21:12 UTC 2017

On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:36, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >>>recheck the seqno again before returning.
> >>>
> >>>Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>---
> >>>drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> >>>1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>index c1b400f1ede4..ecb8b414bdd2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	if (__i915_gem_request_completed(req, seqno))
> >>>		return true;
> >>>
> >>>+	if (!engine->irq_seqno_barrier)
> >>>+		return false;
> >>>+
> >>>	/* Ensure our read of the seqno is coherent so that we
> >>>	 * do not "miss an interrupt" (i.e. if this is the last
> >>>	 * request and the seqno write from the GPU is not visible
> >>>@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	 * but it is easier and safer to do it every time the waiter
> >>>	 * is woken.
> >>>	 */
> >>>-	if (engine->irq_seqno_barrier &&
> >>>-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>		unsigned long flags;
> >>>
> >>>		/* The ordering of irq_posted versus applying the barrier
> >>>
> >>
> >>Hmmm.. if this helps it feels that there is a race somewhere.
> >>Because we have processed one interrupt, but it wasn't for us.
> >
> >There may be many interrupts between now and the one we actually want.
> >The caller of this function is (or was at the time) has the first seqno
> >to be signaled, it is just not necessarily the next interrupt.
> >
> >>That
> >>means there will be another one coming. Why handle that at this
> >>level? Why check twice and not three, four, five times? :)
> >
> >Because they are all for us! This only saves a trip through schedule. If
> >we don't loop here, we just loop at the next level.
> 
> Yes, which looks more correct to me. Because the caller then do what
> they want. Signaller just sleeps and i915_wait_request potentially
> busy waits for a bit.

A persuasive counter argument would be:

Interrupts can be arriving frequently for an indefinite period before we
see our seqno that breaks the loop. We should not allow this to spin
indefinitely and at the very least need to be checking for pending
interrupts or timeslice exhaustion. Both of which are handled in the
callers.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre