[Intel-gfx] [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd

Tue Dec 8 06:33:54 PST 2015

On Tue, Dec 08, 2015 at 02:03:39PM +0000, Tvrtko Ursulin wrote:
> On 08/12/15 10:44, Chris Wilson wrote:
> >On Mon, Dec 07, 2015 at 03:08:28PM +0000, Tvrtko Ursulin wrote:
> >>Equally, why wouldn't we wake up all waiters for which the requests
> >>have been completed?
> >
> >Because we no longer track the requests to be completed, having moved to
> >a chain of waiting processes instead of a chain of requests. I could
> >insert a waitqueue into the intel_breadcrumb and that would indeed
> >necessitate locking in the irq handler (and irq locks everywhere :(.
> 
> You have a tree of seqnos each with a wait->task and current seqno
> on the engine can be queried. So I don't see where is the problem?

The "problem" is that every process will do its own post-irq seqno check
after being woken up, then grab the common spinlock to remove itself
from the tree.

We could avoid that by using RB_NODE_EMPTY shortcircuiting I suppose.
 
> >>Would be a cheap check here and it would save a cascading growing
> >>latency as one task wakes up the next one.
> >
> >Well, it can't be here since we may remove_waiter after a signal
> >(incomplete wait). So this part has to walk the chain of processes. Ugh,
> >and have to move the waitqueue from one waiter to the next...
> 
> Ok on interrupted waiters it makes no sense, but on normal waiter
> removal it would just mean comparing engine->get_seqno() vs the
> first waiter seqno and waking up all until the uncompleted one is
> found?

It's a tradeoff whether it can be written more neatly than:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be76086..474636f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1279,6 +1279,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
                        break;
                }
 
+               if (RB_EMPTY_NODE(&wait.node))
+                       break;
+
                set_task_state(wait.task, state);
 
                /* Before we do the heavier coherent read of the seqno,
@@ -1312,7 +1315,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
                        break;
        }
 out:
-       intel_engine_remove_breadcrumb(req->ring, &wait);
+       intel_engine_remove_breadcrumb(req->ring, &wait, ret);
        __set_task_state(wait.task, TASK_RUNNING);
        trace_i915_gem_request_wait_end(req);
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ae3ee3c..421c214 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -169,10 +169,14 @@ void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
 }
 
 void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
-                                   struct intel_breadcrumb *wait)
+                                   struct intel_breadcrumb *wait,
+                                   int ret)
 {
        struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
+       if (RB_EMPTY_NODE(&wait->node))
+               return;
+
        spin_lock(&b->lock);
 
        if (b->first_waiter == wait->task) {
@@ -187,6 +191,18 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
                 * completion check.
                 */
                next = rb_next(&wait->node);
+               if (ret == 0) {
+                       u32 seqno = intel_ring_get_seqno(engine);
+                       while (next &&
+                              i915_seqno_passed(seqno,
+                                                to_crumb(next)->seqno)) {
+                               struct rb_node *n = rb_next(next);
+                               wake_up_process(next->task);
+                               rb_erase(next, &b->requests);
+                               RB_CLEAR_NODE(next);
+                               next = n;
+                       }
+               }
                task = next ? to_crumb(next)->task : NULL;
 
                b->first_waiter = task;

My biggest complaint is that we are mixing request-complete and direct
evaluation of i915_seqno_passed now. (I expect that we can tidy the code
up somewhat.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre