[Intel-gfx] [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending

Thu Feb 16 11:40:55 UTC 2017

On Thu, Feb 16, 2017 at 11:17:11AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:47, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:38:08AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>If the timer expires for enabling the fake interrupt, check to see
> >>>if there is a real interrupt queued before making the decision to start
> >>>polling. This helps in situations where we have a very slow irq-seqno
> >>>barrier that may accrue more breadcrumb interrupts before it is able to
> >>>catch up. It still leaves a hole for the timer to expire as we are
> >>>processing the last irq-seqno barrier, but it appears to help reduce the
> >>>frequency of "missed-interrupts" on Ironlake, at least.
> >>>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> >>>Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>---
> >>>drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
> >>>1 file changed, 5 insertions(+)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>index ef3adfd37d7d..21269421bd2a 100644
> >>>--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>@@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
> >>>		return;
> >>>	}
> >>>
> >>>+	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>+		mod_timer(&b->hangcheck, jiffies + 1);
> >>>+		return;
> >>>+	}
> >>>+
> >>>	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> >>>	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> >>>	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> >>>
> >>
> >>Another hmm :), barriers are so much shorter than the hangcheck
> >>interval (1500ms) so I don't quite understand what is the problem.
> >
> >We may queue the wait many, many seconds before it is even being
> >processed. The point of this timer is to detect when we haven't seen an
> >interrupt for some time and that happens to be the waiter missed (backup
> >for seqno-barrier failing).
> 
> But it is the same as a slow batch which completes just as the
> hangcheck timer fires.

Yes.

> Is the problem is seqno barrier on some platforms can slow down the
> signaller thread a lot?

Yes. Waiting for the seqno to become visible appears unbounded...

> Hm, if the request duration is right it
> would sleep on every invocation. So in effect extend the request
> duration as seen form userspace by the barrier duration. Why would
> that be a problem though? Haven't fully figured it out.

It's only a problem if we get interrupts in that case and no seqno
advance. But as we are getting the interrupts, we are at least checking
the seqno and doing a timer wakeup is not going to progress the waiter.

The premise behind the timer is that we will wakeup after an interrupt
and not see the seqno advance. (Or even worse, never see an interrupt!)
Only when the interrupts stop, and the waiter stops checking, do we need
to intervene.

I've rewritten this to explicitly check for stopped interrupts,
hopefully that makes this backup strategy clear.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre