[Intel-gfx] [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Nov 30 05:32:16 PST 2015
On 30/11/15 12:30, Chris Wilson wrote:
> On Mon, Nov 30, 2015 at 12:05:50PM +0000, Tvrtko Ursulin wrote:
>>> + struct intel_breadcrumbs {
>>> + spinlock_t lock; /* protects the per-engine request list */
>>
>> s/list/tree/
>
> list. We use the tree as a list!
Maybe but it confuses the reader a bit who goes and looks for some list
then.
Related, when I was playing with this I was piggy backing on the
existing per engine request list.
If they are seqno ordered there, and I think they are, then the waker
thread just traverses it in order until hitting the not completed one,
waking up every req->wait_queue as it goes.
In this case it doesn't matter that the waiters come in in
indeterministic order so we don't need a (new) tree.
But the downside is the thread needs struct mutex. And that the
retirement from that list is so lazy..
>>> + struct task_struct *task; /* bh for all user interrupts */
>>> + struct intel_breadcrumbs_engine {
>>> + struct rb_root requests; /* sorted by retirement */
>>> + struct drm_i915_gem_request *first;
>>
>> /* First request in the tree to be completed next by the hw. */
>>
>> ?
>
> /* first */ ?
Funny? :) First what? Why make the reader reverse engineer it?
>>> + } engine[I915_NUM_RINGS];
>>> + } breadcrumbs;
>>> +
>>> struct i915_virtual_gpu vgpu;
>>>
>>> struct intel_guc guc;
>>> @@ -2228,6 +2250,11 @@ struct drm_i915_gem_request {
>>> /** global list entry for this request */
>>> struct list_head list;
>>>
>>> + /** List of clients waiting for completion of this request */
>>> + wait_queue_head_t wait;
>>> + struct rb_node irq_node;
>>> + unsigned irq_count;
>>
>> To me irq prefix does not fit here that well.
>>
>> req_tree_node?
>>
>> waiter_count, waiters?
>
> waiters, breadcrumb_node, waiters_count ?
Deal.
>>> for (;;) {
>>> - struct timer_list timer;
>>> -
>>> - prepare_to_wait(&ring->irq_queue, &wait, state);
>>> + prepare_to_wait(&req->wait, &wait, state);
>>>
>>> - /* We need to check whether any gpu reset happened in between
>>> - * the caller grabbing the seqno and now ... */
>>> - if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
>>> - /* As we do not requeue the request over a GPU reset,
>>> - * if one does occur we know that the request is
>>> - * effectively complete.
>>> - */
>>> - ret = 0;
>>> - break;
>>> - }
>>> -
>>> - if (i915_gem_request_completed(req, false)) {
>>> + if (i915_gem_request_completed(req, true) ||
>>
>> Why it is OK to change the lazy coherency mode here?
>
> We bang on the coherency after the IRQ. Here, we are woken by the IRQ
> waiter so we know that the IRQ/seqno coherency is fine and we can just
> check the CPU cache.
Makes sense, put into comment? There is space now since function is much
shorter. :)
>>> + req->reset_counter != i915_reset_counter(&req->i915->gpu_error)) {
>>
>> It looks like it would be valuable to preserve the comment about no
>> re-queuing etc on GPU reset.
>
> I can drop it from the previous patch :)
No it is useful!
>>> + if (timeout) {
>>> + timeout_remain = io_schedule_timeout(timeout_remain);
>>> + if (timeout_remain == 0) {
>>> + ret = -ETIME;
>>> + break;
>>> + }
>>> + } else
>>> + io_schedule();
>>
>> Could set timeout_remain to that MAX value when timeout == NULL and
>> have a single call site to io_schedule_timeout and less indentation.
>> It doesn't matter hugely, maybe it would be a bit easier to read.
>
> Done. io_schedule() calls io_schedule_timeout(), whereas
> schedule_timeout() calls schedule()!
Who disagreed with that? :) Ok!
>>> @@ -900,7 +901,7 @@ static void i915_record_ring_state(struct drm_device *dev,
>>> ering->instdone = I915_READ(GEN2_INSTDONE);
>>> }
>>>
>>> - ering->waiting = waitqueue_active(&ring->irq_queue);
>>> + ering->waiting = READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first);
>>
>> What does the READ_ONCE do here?
>
> This element is protected by the breadcrumbs spinlock, the READ_ONCE is
> documentation that we don't care about the lock here.
Hm I find it confusing but OK.
>>> static void notify_ring(struct intel_engine_cs *ring)
>>> {
>>> - if (!intel_ring_initialized(ring))
>>> + if (ring->i915 == NULL)
>>> return;
>>>
>>> trace_i915_gem_request_notify(ring);
>>> -
>>> - wake_up_all(&ring->irq_queue);
>>> + wake_up_process(ring->i915->breadcrumbs.task);
>>
>> For a later extension:
>>
>> How would you view, some time in the future, adding a bool return to
>> ring->put_irq() which would say to the thread that it failed to
>> disable interrupts?
>>
>> In this case thread would exit and would need to be restarted when
>> the next waiter queues up. Possibly extending the
>> idle->put_irq->exit durations as well then.
>
> Honestly, not keen on the idea that the lowlevel put_irq can fail (makes
> cleanup much more fraught). I don't see what improvement you intend
> here, maybe clarifying that would help explain what you mean.
Don't know, maybe reflecting the fact it wasn't the last put_irq call so
let the caller handle it if they want. We can leave this discussion for
the future.
>>> @@ -2986,16 +2981,16 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>>> if (ring_idle(ring, seqno)) {
>>> ring->hangcheck.action = HANGCHECK_IDLE;
>>>
>>> - if (waitqueue_active(&ring->irq_queue)) {
>>> + if (READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first)) {
>>
>> READ_ONCE is again strange, but other than that I don't know the
>> hangcheck code to really evaulate it.
>>
>> Perhaps this also means this line needs a comment describing what
>> condition/state is actually approximating with the check.
>
> All we are doing is asking if anyone is waiting on the GPU, because the
> GPU has finished all of its work. If so, the IRQ handling is suspect
> (modulo the pre-existing race condition clearly earmarked by READ_ONCE).
Cool, /* Check if someone is waiting on the GPU */ then would make me happy.
>>> + while (!kthread_should_stop()) {
>>> + unsigned reset_counter = i915_reset_counter(&i915->gpu_error);
>>> + unsigned long timeout_jiffies;
>>> + bool idling = false;
>>> +
>>> + /* On every tick, walk the seqno-ordered list of requests
>>> + * and for each retired request wakeup its clients. If we
>>> + * find an unfinished request, go back to sleep.
>>> + */
>>
>> s/tick/loop/ ?
> s/tick/iteration/ I'm sticking with tick
Tick makes me tick of a scheduler tick so to me it is the worst of the
three. Iteration sounds really good. Whether that will be a
tick/jiffie/orsomething is definite a bit lower in the code.
>> And if we don't find and unfinished request still go back to sleep. :)
>
> Ok.
>
>>> + set_current_state(TASK_INTERRUPTIBLE);
>>> +
>>> + /* Note carefully that we do not hold a reference for the
>>> + * requests on this list. Therefore we only inspect them
>>> + * whilst holding the spinlock to ensure that they are not
>>> + * freed in the meantime, and the client must remove the
>>> + * request from the list if it is interrupted (before it
>>> + * itself releases its reference).
>>> + */
>>> + spin_lock(&b->lock);
>>
>> This lock is more per engine that global in its nature unless you
>> think it was more efficient to do fewer lock atomics vs potential
>> overlap of waiter activity per engines?
>
> Indeed. I decided not to care about making it per-engine.
>
>> Would need a new lock for req->irq_count, per request. So
>> req->lock/req->irq_count and be->lock for the tree operations.
>
> Why? A request is tied to an individual engine, therefore we can use
> that breadcrumb_engine.lock for the request.
Just so 2nd+ waiters would not touch the per engine lock.
>> Thread would only need to deal with the later. Feels like it would
>> work, just not sure if it is worth it.
>
> That was my feeling as well. Not sure if we will have huge contention
> that would be solved with per-engine spinlocks, whilst we still have
> struct_mutex.
>
>>> + for (i = 0; i < I915_NUM_RINGS; i++) {
>>> + struct intel_engine_cs *engine = &i915->ring[i];
>>> + struct intel_breadcrumbs_engine *be = &b->engine[i];
>>> + struct drm_i915_gem_request *request = be->first;
>>> +
>>> + if (request == NULL) {
>>> + if ((irq_get & (1 << i))) {
>>> + if (irq_enabled & (1 << i)) {
>>> + engine->irq_put(engine);
>>> + irq_enabled &= ~ (1 << i);
>>> + }
>>> + intel_runtime_pm_put(i915);
>>> + irq_get &= ~(1 << i);
>>> + }
>>> + continue;
>>> + }
>>> +
>>> + if ((irq_get & (1 << i)) == 0) {
>>> + intel_runtime_pm_get(i915);
>>> + irq_enabled |= __irq_enable(engine) << i;
>>> + irq_get |= 1 << i;
>>> + }
>>
>> irq_get and irq_enabled are creating a little bit of a headache for
>> me. And that would probably mean a comment is be in order to explain
>> what they are for and how they work.
>>
>> Like this I don't see the need for irq_get to persist across loops?
>
> Because getting the irq is quite slow, we add a jiffie shadow.
But it is not a shadow in the sense of the same bitmask from the
previous iteration. The bits are different depending on special cases in
__irq_enable.
So it needs some comments I think.
>> irq_get looks like "try to get these", irq_enabled is "these ones I
>> got". Apart for the weird usage of it to balance the runtime pm gets
>> and puts.
>>
>> A bit confusing as I said.
>
> I don't like the names, but haven't thought of anything better.
>
>>> +bool intel_breadcrumbs_add_waiter(struct drm_i915_gem_request *request)
>>> +{
>>> + struct intel_breadcrumbs *b = &request->i915->breadcrumbs;
>>> + bool first = false;
>>> +
>>> + spin_lock(&b->lock);
>>> + if (request->irq_count++ == 0) {
>>> + struct intel_breadcrumbs_engine *be =
>>> + &b->engine[request->ring->id];
>>> + struct rb_node **p, *parent;
>>> +
>>> + if (be->first == NULL)
>>> + wake_up_process(b->task);
>>
>> With a global b->lock
>
> or even per engine
>
>> it makes no difference but in the logical
>> sense this would usually go last, after the request has been
>> inserted into the tree and waiter initialized.
>
> Except that the code is simpler with kicking the task first.
>
> Since we both thought it might make sense with a per-engine lock, let's
> go with it. Maybe one day it will pay off.
Ok.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list