[Intel-gfx] [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd

Mon Nov 30 04:38:13 PST 2015

On Mon, Nov 30, 2015 at 12:09:30PM +0000, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 30/11/15 10:53, Chris Wilson wrote:
> >On Sun, Nov 29, 2015 at 08:48:06AM +0000, Chris Wilson wrote:
> >>+	/* Optimistic spin for the next jiffie before touching IRQs */
> >>+	if (intel_breadcrumbs_add_waiter(req)) {
> >>+		ret = __i915_spin_request(req, state);
> >>+		if (ret == 0)
> >>+			goto out;
> >
> >There are a couple of interesting side-effects here.  As we know start
> >up the irq in parallel and keep it running for longer, irq/i915 now
> >consumes a lot of CPU time (like 50-100%!) for synchronous batches, but
> >doesn't seem to interfere with latency (as spin_request is still nicely
> >running and catching the request completion). That should also still
> >work nicely on single cpu machines as the irq enabling should not
> >preempt us.  The second interesting side-effect is that the synchronous
> >loads that regressed with a 2us spin-request timeout are now happy again
> >at 2us. Also with the active-request and the can-spin check from
> >add_waiter, running a low fps game with a compositor is not burning the
> >CPU with any spinning.
> 
> Interesting? :) Sounds bad the way you presented it.
> 
> Why and where is the thread burning so much CPU? Would per engine
> req tree locks help?

Just simply being woken up after every batch and checking the seqno is
that expensive. Almost as expensive as the IRQ handler itself! (I expect
top is adding the IRQ handler time to the irq/i915 in this case, perf
says that more time is spent in the IRQ than the bottom-half.) Almost
all waiters will be on the same engine, so I don't expect finer grained
spinlocks to be hugely important.

> Is this new CPU time or the one which would previously be accounted
> against each waiter, polling in wait request?

This is CPU time that used to be absorbed in i915_wait_request(),
currently hidden by i915_spin_request(). It is the reason why having
every waiter doing the check after every interrupt is such a big issue
for some workloads.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre