[Intel-gfx] [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd

Mon Nov 30 06:30:22 PST 2015

On Mon, Nov 30, 2015 at 01:33:50PM +0000, Tvrtko Ursulin wrote:
> 
> On 30/11/15 12:38, Chris Wilson wrote:
> >On Mon, Nov 30, 2015 at 12:09:30PM +0000, Tvrtko Ursulin wrote:
> >>
> >>Hi,
> >>
> >>On 30/11/15 10:53, Chris Wilson wrote:
> >>>On Sun, Nov 29, 2015 at 08:48:06AM +0000, Chris Wilson wrote:
> >>>>+	/* Optimistic spin for the next jiffie before touching IRQs */
> >>>>+	if (intel_breadcrumbs_add_waiter(req)) {
> >>>>+		ret = __i915_spin_request(req, state);
> >>>>+		if (ret == 0)
> >>>>+			goto out;
> >>>
> >>>There are a couple of interesting side-effects here.  As we know start
> >>>up the irq in parallel and keep it running for longer, irq/i915 now
> >>>consumes a lot of CPU time (like 50-100%!) for synchronous batches, but
> >>>doesn't seem to interfere with latency (as spin_request is still nicely
> >>>running and catching the request completion). That should also still
> >>>work nicely on single cpu machines as the irq enabling should not
> >>>preempt us.  The second interesting side-effect is that the synchronous
> >>>loads that regressed with a 2us spin-request timeout are now happy again
> >>>at 2us. Also with the active-request and the can-spin check from
> >>>add_waiter, running a low fps game with a compositor is not burning the
> >>>CPU with any spinning.
> >>
> >>Interesting? :) Sounds bad the way you presented it.
> >>
> >>Why and where is the thread burning so much CPU? Would per engine
> >>req tree locks help?
> >
> >Just simply being woken up after every batch and checking the seqno is
> >that expensive. Almost as expensive as the IRQ handler itself! (I expect
> >top is adding the IRQ handler time to the irq/i915 in this case, perf
> >says that more time is spent in the IRQ than the bottom-half.) Almost
> >all waiters will be on the same engine, so I don't expect finer grained
> >spinlocks to be hugely important.
> >
> >>Is this new CPU time or the one which would previously be accounted
> >>against each waiter, polling in wait request?
> >
> >This is CPU time that used to be absorbed in i915_wait_request(),
> >currently hidden by i915_spin_request(). It is the reason why having
> >every waiter doing the check after every interrupt is such a big issue
> >for some workloads.
> 
> So overall total CPU time is similar, just split differently?

With the pre-enable + spin, we increase CPU time. On those synchronous
tests, the runtime is the same but we are now at 100% CPU in the test
and an additional 50-100% in the irq/i915. Afaict, this is only
affecting those igt, my desktop usage doesn't spin as much.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre