[Intel-gfx] [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd
Dave Gordon
david.s.gordon at intel.com
Tue Dec 8 06:53:27 PST 2015
On 08/12/15 14:33, Chris Wilson wrote:
> On Tue, Dec 08, 2015 at 02:03:39PM +0000, Tvrtko Ursulin wrote:
>> On 08/12/15 10:44, Chris Wilson wrote:
>>> On Mon, Dec 07, 2015 at 03:08:28PM +0000, Tvrtko Ursulin wrote:
>>>> Equally, why wouldn't we wake up all waiters for which the requests
>>>> have been completed?
>>>
>>> Because we no longer track the requests to be completed, having moved to
>>> a chain of waiting processes instead of a chain of requests. I could
>>> insert a waitqueue into the intel_breadcrumb and that would indeed
>>> necessitate locking in the irq handler (and irq locks everywhere :(.
>>
>> You have a tree of seqnos each with a wait->task and current seqno
>> on the engine can be queried. So I don't see where is the problem?
>
> The "problem" is that every process will do its own post-irq seqno check
> after being woken up, then grab the common spinlock to remove itself
> from the tree.
>
> We could avoid that by using RB_NODE_EMPTY shortcircuiting I suppose.
>
>>>> Would be a cheap check here and it would save a cascading growing
>>>> latency as one task wakes up the next one.
>>>
>>> Well, it can't be here since we may remove_waiter after a signal
>>> (incomplete wait). So this part has to walk the chain of processes. Ugh,
>>> and have to move the waitqueue from one waiter to the next...
>>
>> Ok on interrupted waiters it makes no sense, but on normal waiter
>> removal it would just mean comparing engine->get_seqno() vs the
>> first waiter seqno and waking up all until the uncompleted one is
>> found?
>
> It's a tradeoff whether it can be written more neatly than:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index be76086..474636f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1279,6 +1279,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> break;
> }
>
> + if (RB_EMPTY_NODE(&wait.node))
> + break;
> +
> set_task_state(wait.task, state);
>
> /* Before we do the heavier coherent read of the seqno,
> @@ -1312,7 +1315,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> break;
> }
> out:
> - intel_engine_remove_breadcrumb(req->ring, &wait);
> + intel_engine_remove_breadcrumb(req->ring, &wait, ret);
> __set_task_state(wait.task, TASK_RUNNING);
> trace_i915_gem_request_wait_end(req);
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index ae3ee3c..421c214 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -169,10 +169,14 @@ void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
> }
>
> void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
> - struct intel_breadcrumb *wait)
> + struct intel_breadcrumb *wait,
> + int ret)
> {
> struct intel_breadcrumbs *b = &engine->breadcrumbs;
>
> + if (RB_EMPTY_NODE(&wait->node))
> + return;
> +
> spin_lock(&b->lock);
>
> if (b->first_waiter == wait->task) {
> @@ -187,6 +191,18 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
> * completion check.
> */
> next = rb_next(&wait->node);
> + if (ret == 0) {
> + u32 seqno = intel_ring_get_seqno(engine);
> + while (next &&
> + i915_seqno_passed(seqno,
> + to_crumb(next)->seqno)) {
> + struct rb_node *n = rb_next(next);
> + wake_up_process(next->task);
> + rb_erase(next, &b->requests);
> + RB_CLEAR_NODE(next);
> + next = n;
> + }
> + }
> task = next ? to_crumb(next)->task : NULL;
>
> b->first_waiter = task;
>
> My biggest complaint is that we are mixing request-complete and direct
> evaluation of i915_seqno_passed now. (I expect that we can tidy the code
> up somewhat.)
> -Chris
So how is this going to work in the new "struct fence" regime? (See John
Harrison's RFC of 23rd November).
The fence structure contains the per-request completion indicator, so
once a fence has been signalled the owner only needs to check that, and
doesn't need to reevaluate any seqno comparisons after waking up.
Does that help?
.Dave.
More information about the Intel-gfx
mailing list