[Intel-gfx] [PATCH 1/2] drm/i915: signal first fence from irq handler if complete

Wed Feb 22 17:10:04 UTC 2017

On Wed, Feb 22, 2017 at 04:53:35PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/02/2017 15:51, Chris Wilson wrote:
> >@@ -4034,9 +4038,8 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > 	 * is woken.
> > 	 */
> > 	if (engine->irq_seqno_barrier &&
> >-	    rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current &&
> > 	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >-		struct task_struct *tsk;
> >+		unsigned long flags;
> >
> > 		/* The ordering of irq_posted versus applying the barrier
> > 		 * is crucial. The clearing of the current irq_posted must
> >@@ -4058,17 +4061,17 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > 		 * the seqno before we believe it coherent since they see
> > 		 * irq_posted == false but we are still running).
> > 		 */
> >-		rcu_read_lock();
> >-		tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh);
> >-		if (tsk && tsk != current)
> >+		spin_lock_irqsave(&engine->breadcrumbs.lock, flags);
> >+		if (engine->breadcrumbs.first_wait &&
> >+		    engine->breadcrumbs.first_wait->tsk != current)
> > 			/* Note that if the bottom-half is changed as we
> > 			 * are sending the wake-up, the new bottom-half will
> > 			 * be woken by whomever made the change. We only have
> > 			 * to worry about when we steal the irq-posted for
> > 			 * ourself.
> > 			 */
> >-			wake_up_process(tsk);
> >-		rcu_read_unlock();
> >+			wake_up_process(engine->breadcrumbs.first_wait->tsk);
> >+		spin_unlock_irqrestore(&engine->breadcrumbs.lock, flags);
> 
> Worth caching &engine->breadcrumbs maybe?

I'll ask gcc.

> >
> > 		if (__i915_gem_request_completed(req, seqno))
> > 			return true;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >index e22eacec022d..2e7bdb0cf069 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -1083,6 +1083,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> > 	}
> >
> > 	wait.tsk = current;
> >+	wait.request = req;
> 
> I guess this was the preemption prep tree already. If you decide to
> keep the intel_wait_init helper could add req to it.

Challenge being not all users of intel_wait_init have a request :)

> > restart:
> > 	do {
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> >index 5f73d8c0a38a..0efee879df23 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >@@ -32,10 +32,12 @@
> >
> > struct drm_file;
> > struct drm_i915_gem_object;
> >+struct drm_i915_gem_request;
> >
> > struct intel_wait {
> > 	struct rb_node node;
> > 	struct task_struct *tsk;
> >+	struct drm_i915_gem_request *request;
> > 	u32 seqno;
> > };
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 57fa1bf78a85..0c370c687c2a 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -1033,10 +1033,44 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
> >
> > static void notify_ring(struct intel_engine_cs *engine)
> > {
> >+	struct drm_i915_gem_request *rq = NULL;
> >+	struct intel_wait *wait;
> >+
> >+	if (!intel_engine_has_waiter(engine))
> >+		return;
> >+
> >+	trace_i915_gem_request_notify(engine);
> > 	atomic_inc(&engine->irq_count);
> > 	set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
> >-	if (intel_engine_wakeup(engine))
> >-		trace_i915_gem_request_notify(engine);
> >+
> >+	rcu_read_lock();
> 
> Not sure this is required from an irq handler. But I don't think it
> harms either, so maybe has an usefulness in documenting things.

That's what I said! It's not, but it's cheap enough that I think it is
worth keeping to document the game being played with rq.

> Looks OK.
> 
> Last time I tested it was a definitive latency improvement but also
> a slight throughput regression on crazy microbenchmarks like
> gem_latency IIRC. But I think overall a good thing for more
> realistic workloads.

Yup. It's a big one for gem_exec_whisper which is latency sensitive (and
gem_exec_latency). I didn't see gem_latency (using tests/gem_latency)
suffer too much. To be precise, which gem_latency are you thinking of?
:) I presume ezbench -b gem:latency ?

I did look at trying to reduce the cost of the spinlocks (by trying to
share them between requests, or the engine->timeline and breadcrumbs)
but came away bruised and battered.

> I leave to you the ordering wrt/ the preemption prep series. I can
> apply my r-b at that point, not sure if it makes sense now?

It's in the queue, and I'll resend it once the earlier patches land so
that CI has a run over it, and to catch any more ideas.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre