[Intel-gfx] [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking

Wed Jan 23 16:28:42 UTC 2019

On 23/01/2019 10:01, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-23 09:21:45)
>>
>> On 21/01/2019 22:21, Chris Wilson wrote:
>>> -static void error_record_engine_waiters(struct intel_engine_cs *engine,
>>> -                                     struct drm_i915_error_engine *ee)
>>> -{
>>> -     struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> -     struct drm_i915_error_waiter *waiter;
>>> -     struct rb_node *rb;
>>> -     int count;
>>> -
>>> -     ee->num_waiters = 0;
>>> -     ee->waiters = NULL;
>>> -
>>> -     if (RB_EMPTY_ROOT(&b->waiters))
>>> -             return;
>>> -
>>> -     if (!spin_trylock_irq(&b->rb_lock)) {
>>> -             ee->waiters = ERR_PTR(-EDEADLK);
>>> -             return;
>>> -     }
>>> -
>>> -     count = 0;
>>> -     for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
>>> -             count++;
>>> -     spin_unlock_irq(&b->rb_lock);
>>> -
>>> -     waiter = NULL;
>>> -     if (count)
>>> -             waiter = kmalloc_array(count,
>>> -                                    sizeof(struct drm_i915_error_waiter),
>>> -                                    GFP_ATOMIC);
>>> -     if (!waiter)
>>> -             return;
>>> -
>>> -     if (!spin_trylock_irq(&b->rb_lock)) {
>>> -             kfree(waiter);
>>> -             ee->waiters = ERR_PTR(-EDEADLK);
>>> -             return;
>>> -     }
>>> -
>>> -     ee->waiters = waiter;
>>> -     for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
>>> -             struct intel_wait *w = rb_entry(rb, typeof(*w), node);
>>> -
>>> -             strcpy(waiter->comm, w->tsk->comm);
>>> -             waiter->pid = w->tsk->pid;
>>> -             waiter->seqno = w->seqno;
>>> -             waiter++;
>>> -
>>> -             if (++ee->num_waiters == count)
>>> -                     break;
>>> -     }
>>> -     spin_unlock_irq(&b->rb_lock);
>>> -}
>>
>> Capturing context waiters is not interesting for error state?
> 
> Not really, we don't have a direct link to the process. We could dig it
> out by identifying our special wait_cb inside the fence->signal_list,
> but I couldn't be bothered. Who's waiting at the time of the error has
> never been that interesting for error debugging, just provides an
> overview of the system state.
> 
> Who issued the hanging command is much more of interest for the hunting
> posse than their victim.
> 
> However, storing fence->flag (i.e. the DMA_FENCE_FLAG_SIGNAL_ENABLE_BIT
> + DMA_FENCE_FLAG_SIGNALED_BIT) seems like it would come in handy.
> 
>>> -static bool __i915_spin_request(const struct i915_request *rq,
>>> -                             u32 seqno, int state, unsigned long timeout_us)
>>> +static bool __i915_spin_request(const struct i915_request * const rq,
>>> +                             int state, unsigned long timeout_us)
>>>    {
>>> -     struct intel_engine_cs *engine = rq->engine;
>>> -     unsigned int irq, cpu;
>>> -
>>> -     GEM_BUG_ON(!seqno);
>>> +     unsigned int cpu;
>>>    
>>>        /*
>>>         * Only wait for the request if we know it is likely to complete.
>>> @@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
>>>         * it is a fair assumption that it will not complete within our
>>>         * relatively short timeout.
>>>         */
>>> -     if (!intel_engine_has_started(engine, seqno))
>>> +     if (!i915_request_started(rq))
>>
>> Might be more wasteful the more preemption is going on. Probably not the
>> most important thing to try a fix straight away, but something to put
>> down on some to do list.
> 
> Actually... That would be cheap to fix here as we do a test_bit(ACTIVE).
> Hmm, I wonder if that makes sense for all callers.
> 
> Maybe i915_request_is_running(rq) as a followup.
>   
>> Above comment is also outdated now (engine order).
> 
> I left a comment! Silly me.
> 
>>> +enum {
>>> +     I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
>>> +     I915_FENCE_FLAG_SIGNAL,
>>
>> Describe in comments what these mean please.
> 
> Mean, you expect them to have meaning outside of their use? :)

No, just that the use can be glanced from here instead of derived from 
following the code. :p

>>> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>
>> How can you afford to have this per engine? I guess I might figure out
>> later in the patch/series.
> 
> Hmm, it's always been per engine... What cost are you considering?

I was getting ahead of myself, well the patch series, by thinking how 
you can afford to store list of waiters per engine, while with the 
virtual engine we won't know. Unless there will be a list on the virtual 
engine and some sort of a super-list about on which engines to run the 
breadcrumbs irq handler for every physical engine interrupt.

>>> +     struct intel_context *ce, *cn;
>>> +     struct i915_request *rq, *rn;
>>> +     LIST_HEAD(signal);
>>> +
>>> +     spin_lock(&b->irq_lock);
>>> +
>>> +     b->irq_fired = true;
>>> +     if (b->irq_armed && list_empty(&b->signalers))
>>> +             __intel_breadcrumbs_disarm_irq(b);
>>> +
>>> +     list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
>>> +             GEM_BUG_ON(list_empty(&ce->signals));
>>> +
>>> +             list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
>>> +                     if (!__request_completed(rq))
>>> +                             break;
>>> +
>>> +                     GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
>>> +                                          &rq->fence.flags));
>>> +                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>>> +
>>> +                     if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> +                                  &rq->fence.flags))
>>> +                             continue;
>>
>> Request has been signalled already, but is still on this list? Who will
>> then remove it from this list)?
> 
> Race with retire-request, as we operate here only under b->irq_lock not
> rq->lock, and retire-request uses rq->lock then b->irq_lock.

Right, yes.

> 
>>> +                     /*
>>> +                      * Queue for execution after dropping the signaling
>>> +                      * spinlock as the callback chain may end adding
>>> +                      * more signalers to the same context or engine.
>>> +                      */
>>> +                     i915_request_get(rq);
>>> +                     list_add_tail(&rq->signal_link, &signal);
>>
>> Shouldn't this be list_move_tail since rq is already on the ce->signals
>> list?
> 
> (1) We delete in bulk, see (2)
> 
>>> +             }
>>> +
>>> +             if (!list_is_first(&rq->signal_link, &ce->signals)) {
>>
>> Can't rq be NULL here - if only completed requests are on the list and
>> so the iterator reached the end?
> 
> Iterator at end == &ce->signals.

Yes, my bad.

> 
>>> +                     __list_del_many(&ce->signals, &rq->signal_link);
>>
>>
>> This block could use a comment - I at least failed to quickly understand
>> it. How can we be unlinking entries, if they have already been unlinked?
> 
> (2) Because we did list_add not list_move, see (1).
> 
>>> +                     if (&ce->signals == &rq->signal_link)
>>> +                             list_del_init(&ce->signal_link);
>>
>> This is another list_empty hack like from another day? Please put a
>> comment if you don't want it to be self documenting.
> 
>>> -static int intel_breadcrumbs_signaler(void *arg)
>>> +bool intel_engine_enable_signaling(struct i915_request *rq)
>>
>> intel_request_enable_signaling?
> 
> I'm warming to it.
> 
>>> +     spin_lock(&b->irq_lock);
>>> +     if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
>>
>> __test_bit?
> 
> Heh. test_bit == __test_bit :)
> 
> In general though, we have to be cautious as we don't own the whole flags
> field.
> 
>>> +             list_for_each_prev(pos, &ce->signals) {
>>> +                     struct i915_request *it =
>>> +                             list_entry(pos, typeof(*it), signal_link);
>>>    
>>> -                     if (unlikely(kthread_should_stop()))
>>> +                     if (i915_seqno_passed(rq->fence.seqno,
>>
>> Put a comment against this loop please saying where in the list it is
>> looking to insert...
> 
> Oh you haven't written this variant of insertion sort for vblank
> handling 20 times. I swear I end up repeating my mistakes over and over
> again at every level in the stack.
> 
>>> +             list_add(&rq->signal_link, pos);
>>> +             if (pos == &ce->signals)
>>> +                     list_move_tail(&ce->signal_link, &b->signalers);
>>
>> ... and here how it manages the other list as well, on transition from
>> empty to active.
> 
> Seems like the code was easy enough to follow ;)
> 
>>> +     spin_lock(&b->irq_lock);
>>> +     if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
>>
>> __test_and_clear_bit ?
> 
> Yeah this is where we need to be careful with rq->fence.flags. It is not
> specified that __test_and_clear_bit only operates on the single bit and
> so to be cautious we do the locked instruction to clobbering other
> atomic instructions to the rest of the field.

Ok.

> Although you can make a very strong case that all fence->flags are
> serialised for signaling today.
> 
>>> -             spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
>>> -             struct rb_root waiters; /* sorted by retirement, priority */
>>> -             struct list_head signals; /* sorted by retirement */
>>> -             struct task_struct *signaler; /* used for fence signalling */
>>> +             struct irq_work irq_work;
>>
>> Why did you need irq work and not just invoke the handler directly?
>> Maybe put a comment here giving a hint.
> 
> /* lock inversion horrors */
> 
> Due to the way we may directly submit requests on handling the
> dma_fence_signal, we can end up processing a
> i915_request_enable_signaling on the same engine as is currently
> emitting the signal.
> 
>>> +static int __igt_breadcrumbs_smoketest(void *arg)
>>> +{
>>> +     struct smoketest *t = arg;
>>> +     struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
>>
>> Breaking new ground, well okay, although caching dev or i915 would be
>> good enough.
>>
>>> +     struct i915_request **requests;
>>> +     I915_RND_STATE(prng);
>>> +     const unsigned int total = 4 * t->ncontexts + 1;
>>> +     const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
>>> +     unsigned int num_waits = 0, num_fences = 0;
>>> +     unsigned int *order;
>>> +     int err = 0;
>>
>> Still in the Chrismas spirit? ;) No worries, it's selftests.
> 
> I was feeling generous in replacing the elaborate breadcrumb testing we
> had with something at all!

That was a dig at a Christmas tree declarations you are normally so fond 
of. :))

> That testing was the best part of intel_breadcrumbs.
>   
>> I ran out of steam and will look at selftests during some second pass.
>> In expectation, please put some high level comments for each test to
>> roughly say what it plans to test and with what approach. I makes
>> reverse engineering the algorithm much easier.
> 
> There's only one test (just run with mock_request and i915_request), a
> very, very, very simple smoketest.
> 
> I did not come up with ways of testing the new signal_list to the same
> rigour as we did before. :(

I'll go and read the next version..

Regards,

Tvrtko