[Intel-gfx] [PATCH 1/2] drm/i915: signal first fence from irq handler if complete
Chris Wilson
chris at chris-wilson.co.uk
Wed Feb 22 17:10:04 UTC 2017
On Wed, Feb 22, 2017 at 04:53:35PM +0000, Tvrtko Ursulin wrote:
>
> On 17/02/2017 15:51, Chris Wilson wrote:
> >@@ -4034,9 +4038,8 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > * is woken.
> > */
> > if (engine->irq_seqno_barrier &&
> >- rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current &&
> > test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >- struct task_struct *tsk;
> >+ unsigned long flags;
> >
> > /* The ordering of irq_posted versus applying the barrier
> > * is crucial. The clearing of the current irq_posted must
> >@@ -4058,17 +4061,17 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > * the seqno before we believe it coherent since they see
> > * irq_posted == false but we are still running).
> > */
> >- rcu_read_lock();
> >- tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh);
> >- if (tsk && tsk != current)
> >+ spin_lock_irqsave(&engine->breadcrumbs.lock, flags);
> >+ if (engine->breadcrumbs.first_wait &&
> >+ engine->breadcrumbs.first_wait->tsk != current)
> > /* Note that if the bottom-half is changed as we
> > * are sending the wake-up, the new bottom-half will
> > * be woken by whomever made the change. We only have
> > * to worry about when we steal the irq-posted for
> > * ourself.
> > */
> >- wake_up_process(tsk);
> >- rcu_read_unlock();
> >+ wake_up_process(engine->breadcrumbs.first_wait->tsk);
> >+ spin_unlock_irqrestore(&engine->breadcrumbs.lock, flags);
>
> Worth caching &engine->breadcrumbs maybe?
I'll ask gcc.
> >
> > if (__i915_gem_request_completed(req, seqno))
> > return true;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >index e22eacec022d..2e7bdb0cf069 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -1083,6 +1083,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> > }
> >
> > wait.tsk = current;
> >+ wait.request = req;
>
> I guess this was the preemption prep tree already. If you decide to
> keep the intel_wait_init helper could add req to it.
Challenge being not all users of intel_wait_init have a request :)
> > restart:
> > do {
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> >index 5f73d8c0a38a..0efee879df23 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >@@ -32,10 +32,12 @@
> >
> > struct drm_file;
> > struct drm_i915_gem_object;
> >+struct drm_i915_gem_request;
> >
> > struct intel_wait {
> > struct rb_node node;
> > struct task_struct *tsk;
> >+ struct drm_i915_gem_request *request;
> > u32 seqno;
> > };
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 57fa1bf78a85..0c370c687c2a 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -1033,10 +1033,44 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
> >
> > static void notify_ring(struct intel_engine_cs *engine)
> > {
> >+ struct drm_i915_gem_request *rq = NULL;
> >+ struct intel_wait *wait;
> >+
> >+ if (!intel_engine_has_waiter(engine))
> >+ return;
> >+
> >+ trace_i915_gem_request_notify(engine);
> > atomic_inc(&engine->irq_count);
> > set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
> >- if (intel_engine_wakeup(engine))
> >- trace_i915_gem_request_notify(engine);
> >+
> >+ rcu_read_lock();
>
> Not sure this is required from an irq handler. But I don't think it
> harms either, so maybe has an usefulness in documenting things.
That's what I said! It's not, but it's cheap enough that I think it is
worth keeping to document the game being played with rq.
> Looks OK.
>
> Last time I tested it was a definitive latency improvement but also
> a slight throughput regression on crazy microbenchmarks like
> gem_latency IIRC. But I think overall a good thing for more
> realistic workloads.
Yup. It's a big one for gem_exec_whisper which is latency sensitive (and
gem_exec_latency). I didn't see gem_latency (using tests/gem_latency)
suffer too much. To be precise, which gem_latency are you thinking of?
:) I presume ezbench -b gem:latency ?
I did look at trying to reduce the cost of the spinlocks (by trying to
share them between requests, or the engine->timeline and breadcrumbs)
but came away bruised and battered.
> I leave to you the ordering wrt/ the preemption prep series. I can
> apply my r-b at that point, not sure if it makes sense now?
It's in the queue, and I'll resend it once the earlier patches land so
that CI has a run over it, and to catch any more ideas.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list