[Intel-gfx] [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd

Tue Dec 8 06:53:27 PST 2015

On 08/12/15 14:33, Chris Wilson wrote:
> On Tue, Dec 08, 2015 at 02:03:39PM +0000, Tvrtko Ursulin wrote:
>> On 08/12/15 10:44, Chris Wilson wrote:
>>> On Mon, Dec 07, 2015 at 03:08:28PM +0000, Tvrtko Ursulin wrote:
>>>> Equally, why wouldn't we wake up all waiters for which the requests
>>>> have been completed?
>>>
>>> Because we no longer track the requests to be completed, having moved to
>>> a chain of waiting processes instead of a chain of requests. I could
>>> insert a waitqueue into the intel_breadcrumb and that would indeed
>>> necessitate locking in the irq handler (and irq locks everywhere :(.
>>
>> You have a tree of seqnos each with a wait->task and current seqno
>> on the engine can be queried. So I don't see where is the problem?
>
> The "problem" is that every process will do its own post-irq seqno check
> after being woken up, then grab the common spinlock to remove itself
> from the tree.
>
> We could avoid that by using RB_NODE_EMPTY shortcircuiting I suppose.
>
>>>> Would be a cheap check here and it would save a cascading growing
>>>> latency as one task wakes up the next one.
>>>
>>> Well, it can't be here since we may remove_waiter after a signal
>>> (incomplete wait). So this part has to walk the chain of processes. Ugh,
>>> and have to move the waitqueue from one waiter to the next...
>>
>> Ok on interrupted waiters it makes no sense, but on normal waiter
>> removal it would just mean comparing engine->get_seqno() vs the
>> first waiter seqno and waking up all until the uncompleted one is
>> found?
>
> It's a tradeoff whether it can be written more neatly than:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index be76086..474636f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1279,6 +1279,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>                          break;
>                  }
>
> +               if (RB_EMPTY_NODE(&wait.node))
> +                       break;
> +
>                  set_task_state(wait.task, state);
>
>                  /* Before we do the heavier coherent read of the seqno,
> @@ -1312,7 +1315,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>                          break;
>          }
>   out:
> -       intel_engine_remove_breadcrumb(req->ring, &wait);
> +       intel_engine_remove_breadcrumb(req->ring, &wait, ret);
>          __set_task_state(wait.task, TASK_RUNNING);
>          trace_i915_gem_request_wait_end(req);
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index ae3ee3c..421c214 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -169,10 +169,14 @@ void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
>   }
>
>   void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
> -                                   struct intel_breadcrumb *wait)
> +                                   struct intel_breadcrumb *wait,
> +                                   int ret)
>   {
>          struct intel_breadcrumbs *b = &engine->breadcrumbs;
>
> +       if (RB_EMPTY_NODE(&wait->node))
> +               return;
> +
>          spin_lock(&b->lock);
>
>          if (b->first_waiter == wait->task) {
> @@ -187,6 +191,18 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
>                   * completion check.
>                   */
>                  next = rb_next(&wait->node);
> +               if (ret == 0) {
> +                       u32 seqno = intel_ring_get_seqno(engine);
> +                       while (next &&
> +                              i915_seqno_passed(seqno,
> +                                                to_crumb(next)->seqno)) {
> +                               struct rb_node *n = rb_next(next);
> +                               wake_up_process(next->task);
> +                               rb_erase(next, &b->requests);
> +                               RB_CLEAR_NODE(next);
> +                               next = n;
> +                       }
> +               }
>                  task = next ? to_crumb(next)->task : NULL;
>
>                  b->first_waiter = task;
>
> My biggest complaint is that we are mixing request-complete and direct
> evaluation of i915_seqno_passed now. (I expect that we can tidy the code
> up somewhat.)
> -Chris

So how is this going to work in the new "struct fence" regime? (See John 
Harrison's RFC of 23rd November).

The fence structure contains the per-request completion indicator, so 
once a fence has been signalled the owner only needs to check that, and 
doesn't need to reevaluate any seqno comparisons after waking up.

Does that help?

.Dave.