[Intel-gfx] [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd
Chris Wilson
chris at chris-wilson.co.uk
Tue Dec 8 06:33:54 PST 2015
On Tue, Dec 08, 2015 at 02:03:39PM +0000, Tvrtko Ursulin wrote:
> On 08/12/15 10:44, Chris Wilson wrote:
> >On Mon, Dec 07, 2015 at 03:08:28PM +0000, Tvrtko Ursulin wrote:
> >>Equally, why wouldn't we wake up all waiters for which the requests
> >>have been completed?
> >
> >Because we no longer track the requests to be completed, having moved to
> >a chain of waiting processes instead of a chain of requests. I could
> >insert a waitqueue into the intel_breadcrumb and that would indeed
> >necessitate locking in the irq handler (and irq locks everywhere :(.
>
> You have a tree of seqnos each with a wait->task and current seqno
> on the engine can be queried. So I don't see where is the problem?
The "problem" is that every process will do its own post-irq seqno check
after being woken up, then grab the common spinlock to remove itself
from the tree.
We could avoid that by using RB_NODE_EMPTY shortcircuiting I suppose.
> >>Would be a cheap check here and it would save a cascading growing
> >>latency as one task wakes up the next one.
> >
> >Well, it can't be here since we may remove_waiter after a signal
> >(incomplete wait). So this part has to walk the chain of processes. Ugh,
> >and have to move the waitqueue from one waiter to the next...
>
> Ok on interrupted waiters it makes no sense, but on normal waiter
> removal it would just mean comparing engine->get_seqno() vs the
> first waiter seqno and waking up all until the uncompleted one is
> found?
It's a tradeoff whether it can be written more neatly than:
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be76086..474636f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1279,6 +1279,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
+ if (RB_EMPTY_NODE(&wait.node))
+ break;
+
set_task_state(wait.task, state);
/* Before we do the heavier coherent read of the seqno,
@@ -1312,7 +1315,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
out:
- intel_engine_remove_breadcrumb(req->ring, &wait);
+ intel_engine_remove_breadcrumb(req->ring, &wait, ret);
__set_task_state(wait.task, TASK_RUNNING);
trace_i915_gem_request_wait_end(req);
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ae3ee3c..421c214 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -169,10 +169,14 @@ void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
}
void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
- struct intel_breadcrumb *wait)
+ struct intel_breadcrumb *wait,
+ int ret)
{
struct intel_breadcrumbs *b = &engine->breadcrumbs;
+ if (RB_EMPTY_NODE(&wait->node))
+ return;
+
spin_lock(&b->lock);
if (b->first_waiter == wait->task) {
@@ -187,6 +191,18 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
* completion check.
*/
next = rb_next(&wait->node);
+ if (ret == 0) {
+ u32 seqno = intel_ring_get_seqno(engine);
+ while (next &&
+ i915_seqno_passed(seqno,
+ to_crumb(next)->seqno)) {
+ struct rb_node *n = rb_next(next);
+ wake_up_process(next->task);
+ rb_erase(next, &b->requests);
+ RB_CLEAR_NODE(next);
+ next = n;
+ }
+ }
task = next ? to_crumb(next)->task : NULL;
b->first_waiter = task;
My biggest complaint is that we are mixing request-complete and direct
evaluation of i915_seqno_passed now. (I expect that we can tidy the code
up somewhat.)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list