[Intel-gfx] [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter

Wed Jun 8 10:16:13 UTC 2016

On 08/06/16 10:48, Chris Wilson wrote:
> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>> +static int intel_breadcrumbs_signaler(void *arg)
>>> +{
>>> +	struct intel_engine_cs *engine = arg;
>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> +	struct signal *signal;
>>> +
>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>> +	signaler_set_rtpriority();
>>> +
>>> +	do {
>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>> +
>>> +		/* We are either woken up by the interrupt bottom-half,
>>> +		 * or by a client adding a new signaller. In both cases,
>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>> +		 * If it has, propagate the signal, remove the waiter and
>>> +		 * check again with the next oldest signal. Otherwise we
>>> +		 * need to wait for a new interrupt from the GPU or for
>>> +		 * a new client.
>>> +		 */
>>> +		signal = READ_ONCE(b->first_signal);
>>> +		if (signal_complete(signal)) {
>>> +			/* Wake up all other completed waiters and select the
>>> +			 * next bottom-half for the next user interrupt.
>>> +			 */
>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>> +
>>> +			i915_gem_request_unreference(signal->request);
>>> +
>>> +			/* Find the next oldest signal. Note that as we have
>>> +			 * not been holding the lock, another client may
>>> +			 * have installed an even older signal than the one
>>> +			 * we just completed - so double check we are still
>>> +			 * the oldest before picking the next one.
>>> +			 */
>>> +			spin_lock(&b->lock);
>>> +			if (signal == b->first_signal)
>>> +				b->first_signal = rb_next(&signal->node);
>>> +			rb_erase(&signal->node, &b->signals);
>>> +			spin_unlock(&b->lock);
>>> +
>>> +			kfree(signal);
>>> +		} else {
>>> +			if (kthread_should_stop())
>>> +				break;
>>> +
>>> +			schedule();
>>> +		}
>>> +	} while (1);
>>> +
>>> +	return 0;
>>> +}
>>
>> So the thread is only because it is convenient to plug it in the
>> breadcrumbs infrastructure. Otherwise the processing above could be
>> done from a lighter weight context as well since nothing seems to
>> need the process context.
>
> No, seqno processing requires process/sleepable context. The delays we
> incur can be >100us and not suitable for irq/softirq context.

Nothing in this patch needs it - please say in the commit why it is 
choosing the process context then.

And why so long delays? It looks pretty lightweight to me.

>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>> signalers could then potentially use a tasklet?
>
> Hmm, I did find that in order to reduce execlists latency, I had to
> drive the tasklet processing from the signaler.

What do you mean? The existing execlists tasklet? Now would that work?

>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>> +{
>>> +	struct intel_engine_cs *engine = request->engine;
>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> +	struct rb_node *parent, **p;
>>> +	struct signal *signal;
>>> +	bool first, wakeup;
>>> +
>>> +	if (unlikely(IS_ERR(b->signaler)))
>>> +		return PTR_ERR(b->signaler);
>>
>> I don't see that there is a fallback is kthread creation failed. It
>> should just fail in intel_engine_init_breadcrumbs if that happens.
>
> Because it is not fatal to using the GPU, just one optional function.

But we never expect it to fail and it is not even dependent on anything 
user controllable. Just a random error which would cause user experience 
to degrade. If thread creation failed it means system is in such a poor 
shape I would just fail the driver init.

>>> +	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
>>
>> Ugh GFP_ATOMIC - why?
>
> Because of dma-buf/fence.c.

Ah.. yes bad.. fortunately fixed in the following patch.

>> And even should you instead just embed in into requests?
>
> I was resisting embedding even more into requests, so first patch was
> for a simpler integration, with a subsequent patch to embed the node
> into the request.
>
>>> @@ -329,12 +480,24 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>>>   	setup_timer(&b->fake_irq,
>>>   		    intel_breadcrumbs_fake_irq,
>>>   		    (unsigned long)engine);
>>> +
>>> +	/* Spawn a thread to provide a common bottom-half for all signals.
>>> +	 * As this is an asynchronous interface we cannot steal the current
>>> +	 * task for handling the bottom-half to the user interrupt, therefore
>>> +	 * we create a thread to do the coherent seqno dance after the
>>> +	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
>>> +	 */
>>> +	b->signaler = kthread_run(intel_breadcrumbs_signaler,
>>> +				  engine, "irq/i915:%d", engine->id);
>>
>> As commented above, init should fail here because it cannot run
>> without the thread.
>
> We can function without the signaler.

Commented above.

Regards,

Tvrtko