[Intel-gfx] [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd

Mon Nov 30 05:33:50 PST 2015

On 30/11/15 12:38, Chris Wilson wrote:
> On Mon, Nov 30, 2015 at 12:09:30PM +0000, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 30/11/15 10:53, Chris Wilson wrote:
>>> On Sun, Nov 29, 2015 at 08:48:06AM +0000, Chris Wilson wrote:
>>>> +	/* Optimistic spin for the next jiffie before touching IRQs */
>>>> +	if (intel_breadcrumbs_add_waiter(req)) {
>>>> +		ret = __i915_spin_request(req, state);
>>>> +		if (ret == 0)
>>>> +			goto out;
>>>
>>> There are a couple of interesting side-effects here.  As we know start
>>> up the irq in parallel and keep it running for longer, irq/i915 now
>>> consumes a lot of CPU time (like 50-100%!) for synchronous batches, but
>>> doesn't seem to interfere with latency (as spin_request is still nicely
>>> running and catching the request completion). That should also still
>>> work nicely on single cpu machines as the irq enabling should not
>>> preempt us.  The second interesting side-effect is that the synchronous
>>> loads that regressed with a 2us spin-request timeout are now happy again
>>> at 2us. Also with the active-request and the can-spin check from
>>> add_waiter, running a low fps game with a compositor is not burning the
>>> CPU with any spinning.
>>
>> Interesting? :) Sounds bad the way you presented it.
>>
>> Why and where is the thread burning so much CPU? Would per engine
>> req tree locks help?
>
> Just simply being woken up after every batch and checking the seqno is
> that expensive. Almost as expensive as the IRQ handler itself! (I expect
> top is adding the IRQ handler time to the irq/i915 in this case, perf
> says that more time is spent in the IRQ than the bottom-half.) Almost
> all waiters will be on the same engine, so I don't expect finer grained
> spinlocks to be hugely important.
>
>> Is this new CPU time or the one which would previously be accounted
>> against each waiter, polling in wait request?
>
> This is CPU time that used to be absorbed in i915_wait_request(),
> currently hidden by i915_spin_request(). It is the reason why having
> every waiter doing the check after every interrupt is such a big issue
> for some workloads.

So overall total CPU time is similar, just split differently?

Regards,

Tvrtko