[Intel-gfx] [PATCH v2 2/2] drm/i915: Delay disabling the user interrupt for breadcrumbs

Fri Feb 24 10:47:03 UTC 2017

On Fri, Feb 24, 2017 at 10:34:07AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/02/2017 15:42, Chris Wilson wrote:
> >A significant cost in setting up a wait is the overhead of enabling the
> >interrupt. As we disable the interrupt whenever the queue of waiters is
> >empty, if we are frequently waiting on alternating batches, we end up
> >re-enabling the interrupt on a frequent basis. We do want to disable the
> >interrupt during normal operations as under high load it may add several
> >thousand interrupts/s - we have been known in the past to occupy whole
> >cores with our interrupt handler after accidentally leaving user
> >interrupts enabled. As a compromise, leave the interrupt enabled until
> >the next IRQ, or the system is idle. This gives a small window for a
> >waiter to keep the interrupt active and not be delayed by having to
> >re-enable the interrupt.
> >
> >v2: Restore hangcheck/missed-irq detection for continuations
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >---
> > drivers/gpu/drm/i915/i915_gem.c          |  4 +-
> > drivers/gpu/drm/i915/i915_irq.c          |  5 +-
> > drivers/gpu/drm/i915/intel_breadcrumbs.c | 95 +++++++++++++++++++-------------
> > drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 +-
> > 4 files changed, 65 insertions(+), 43 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index a8167003c10b..3ec8c948fe45 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2988,8 +2988,10 @@ i915_gem_idle_work_handler(struct work_struct *work)
> > 	if (wait_for(intel_execlists_idle(dev_priv), 10))
> > 		DRM_ERROR("Timeout waiting for engines to idle\n");
> >
> >-	for_each_engine(engine, dev_priv, id)
> >+	for_each_engine(engine, dev_priv, id) {
> >+		intel_engine_disarm_breadcrumbs(engine);
> > 		i915_gem_batch_pool_fini(&engine->batch_pool);
> >+	}
> >
> > 	GEM_BUG_ON(!dev_priv->gt.awake);
> > 	dev_priv->gt.awake = false;
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 3c79753edf0e..ba0bb33e12ed 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -1036,9 +1036,6 @@ static void notify_ring(struct intel_engine_cs *engine)
> > 	struct drm_i915_gem_request *rq = NULL;
> > 	struct intel_wait *wait;
> >
> >-	if (!intel_engine_has_waiter(engine))
> >-		return;
> >-
> > 	atomic_inc(&engine->irq_count);
> > 	set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
> >
> >@@ -1061,6 +1058,8 @@ static void notify_ring(struct intel_engine_cs *engine)
> > 			rq = wait->request;
> >
> > 		wake_up_process(wait->tsk);
> >+	} else {
> >+		__intel_engine_disarm_breadcrumbs(engine);
> > 	}
> > 	spin_unlock(&engine->breadcrumbs.lock);
> >
> >diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >index 5f2665aa817c..bf14022f6adb 100644
> >--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >@@ -35,8 +35,10 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
> > {
> > 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> > 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+	struct intel_wait *wait;
> >+	unsigned long flags;
> >
> >-	if (!b->irq_enabled)
> >+	if (!b->irq_armed)
> > 		return;
> >
> > 	if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) {
> >@@ -49,10 +51,15 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
> > 	 * to process the pending interrupt (e.g, low priority task on a loaded
> > 	 * system) and wait until it sleeps before declaring a missed interrupt.
> > 	 */
> >-	if (!intel_engine_wakeup(engine)) {
> >+	spin_lock_irqsave(&b->lock, flags);
> >+	wait = b->first_wait;
> >+	if (!wait || !wake_up_process(wait->tsk)) {
> > 		mod_timer(&b->hangcheck, wait_timeout());
> >-		return;
> >+		wait = NULL;
> > 	}
> >+	spin_unlock_irqrestore(&b->lock, flags);
> >+	if (!wait)
> >+		return;
> 
> I don't see a difference versus the old code:
> 
> if (!intel_engine_wakeup(engine)) {
> 	mod_timer(...);
> 	return;
> }

It used to be different and then morphed back to that! However, I do
need two different types of intel_engine_wakeup. One that reports a
successful wakeup of a sleeper and the other that reports having any
waiter.

> There is only one instance of the timer at a time so we don't need
> to extend the lock to mod_timer. Or you were thinking
> intel_engine_reset_breadcrumbs? But there we do del_timer_sync
> before proceeding to touch the timer.

Right, we only need the lock so far as getting wait->tsk (we could
switch to rcu at that point). All I'm looking for here is tidy code.

> > 	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> > 	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> >@@ -110,6 +117,34 @@ static void irq_disable(struct intel_engine_cs *engine)
> > 	spin_unlock(&engine->i915->irq_lock);
> > }
> >
> >+void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> >+{
> >+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+
> >+	assert_spin_locked(&b->lock);
> >+
> >+	if (b->irq_enabled) {
> >+		irq_disable(engine);
> >+		b->irq_enabled = false;
> >+	}
> >+
> >+	b->irq_armed = false;
> >+}
> >+
> >+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> >+{
> >+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+	unsigned long flags;
> >+
> >+	if (!b->irq_armed)
> >+		return;
> 
> Sounds safer to put this check under the spinlock so reader can
> think less. :) Or in fact do we even need the check? It's only the
> idle work handler so could just have the below block I think.

Killed it entirely! Rearranged the code a bit to fixup restarting after
a reset, and it evaporated.

> The rest looks OK, but I wonder if I missed something since the CI
> wasn't happy with it?

Yeah. I didn't get restarting after a reset correct. That seems to have
been the first problem that tripped CI over...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre