[Intel-gfx] [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking

Wed Jan 23 10:01:43 UTC 2019

Quoting Tvrtko Ursulin (2019-01-23 09:21:45)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > -static void error_record_engine_waiters(struct intel_engine_cs *engine,
> > -                                     struct drm_i915_error_engine *ee)
> > -{
> > -     struct intel_breadcrumbs *b = &engine->breadcrumbs;
> > -     struct drm_i915_error_waiter *waiter;
> > -     struct rb_node *rb;
> > -     int count;
> > -
> > -     ee->num_waiters = 0;
> > -     ee->waiters = NULL;
> > -
> > -     if (RB_EMPTY_ROOT(&b->waiters))
> > -             return;
> > -
> > -     if (!spin_trylock_irq(&b->rb_lock)) {
> > -             ee->waiters = ERR_PTR(-EDEADLK);
> > -             return;
> > -     }
> > -
> > -     count = 0;
> > -     for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> > -             count++;
> > -     spin_unlock_irq(&b->rb_lock);
> > -
> > -     waiter = NULL;
> > -     if (count)
> > -             waiter = kmalloc_array(count,
> > -                                    sizeof(struct drm_i915_error_waiter),
> > -                                    GFP_ATOMIC);
> > -     if (!waiter)
> > -             return;
> > -
> > -     if (!spin_trylock_irq(&b->rb_lock)) {
> > -             kfree(waiter);
> > -             ee->waiters = ERR_PTR(-EDEADLK);
> > -             return;
> > -     }
> > -
> > -     ee->waiters = waiter;
> > -     for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> > -             struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> > -
> > -             strcpy(waiter->comm, w->tsk->comm);
> > -             waiter->pid = w->tsk->pid;
> > -             waiter->seqno = w->seqno;
> > -             waiter++;
> > -
> > -             if (++ee->num_waiters == count)
> > -                     break;
> > -     }
> > -     spin_unlock_irq(&b->rb_lock);
> > -}
> 
> Capturing context waiters is not interesting for error state?

Not really, we don't have a direct link to the process. We could dig it
out by identifying our special wait_cb inside the fence->signal_list,
but I couldn't be bothered. Who's waiting at the time of the error has
never been that interesting for error debugging, just provides an
overview of the system state.

Who issued the hanging command is much more of interest for the hunting
posse than their victim.

However, storing fence->flag (i.e. the DMA_FENCE_FLAG_SIGNAL_ENABLE_BIT
+ DMA_FENCE_FLAG_SIGNALED_BIT) seems like it would come in handy.

> > -static bool __i915_spin_request(const struct i915_request *rq,
> > -                             u32 seqno, int state, unsigned long timeout_us)
> > +static bool __i915_spin_request(const struct i915_request * const rq,
> > +                             int state, unsigned long timeout_us)
> >   {
> > -     struct intel_engine_cs *engine = rq->engine;
> > -     unsigned int irq, cpu;
> > -
> > -     GEM_BUG_ON(!seqno);
> > +     unsigned int cpu;
> >   
> >       /*
> >        * Only wait for the request if we know it is likely to complete.
> > @@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
> >        * it is a fair assumption that it will not complete within our
> >        * relatively short timeout.
> >        */
> > -     if (!intel_engine_has_started(engine, seqno))
> > +     if (!i915_request_started(rq))
> 
> Might be more wasteful the more preemption is going on. Probably not the 
> most important thing to try a fix straight away, but something to put 
> down on some to do list.

Actually... That would be cheap to fix here as we do a test_bit(ACTIVE).
Hmm, I wonder if that makes sense for all callers.

Maybe i915_request_is_running(rq) as a followup.

> Above comment is also outdated now (engine order).

I left a comment! Silly me.

> > +enum {
> > +     I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
> > +     I915_FENCE_FLAG_SIGNAL,
> 
> Describe in comments what these mean please.

Mean, you expect them to have meaning outside of their use? :)

> > +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> > +{
> > +     struct intel_breadcrumbs *b = &engine->breadcrumbs;
> 
> How can you afford to have this per engine? I guess I might figure out 
> later in the patch/series.

Hmm, it's always been per engine... What cost are you considering?

> > +     struct intel_context *ce, *cn;
> > +     struct i915_request *rq, *rn;
> > +     LIST_HEAD(signal);
> > +
> > +     spin_lock(&b->irq_lock);
> > +
> > +     b->irq_fired = true;
> > +     if (b->irq_armed && list_empty(&b->signalers))
> > +             __intel_breadcrumbs_disarm_irq(b);
> > +
> > +     list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
> > +             GEM_BUG_ON(list_empty(&ce->signals));
> > +
> > +             list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
> > +                     if (!__request_completed(rq))
> > +                             break;
> > +
> > +                     GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
> > +                                          &rq->fence.flags));
> > +                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> > +
> > +                     if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > +                                  &rq->fence.flags))
> > +                             continue;
> 
> Request has been signalled already, but is still on this list? Who will 
> then remove it from this list)?

Race with retire-request, as we operate here only under b->irq_lock not
rq->lock, and retire-request uses rq->lock then b->irq_lock.

> > +                     /*
> > +                      * Queue for execution after dropping the signaling
> > +                      * spinlock as the callback chain may end adding
> > +                      * more signalers to the same context or engine.
> > +                      */
> > +                     i915_request_get(rq);
> > +                     list_add_tail(&rq->signal_link, &signal);
> 
> Shouldn't this be list_move_tail since rq is already on the ce->signals 
> list?

(1) We delete in bulk, see (2)

> > +             }
> > +
> > +             if (!list_is_first(&rq->signal_link, &ce->signals)) {
> 
> Can't rq be NULL here - if only completed requests are on the list and 
> so the iterator reached the end?

Iterator at end == &ce->signals.

> > +                     __list_del_many(&ce->signals, &rq->signal_link);
> 
> 
> This block could use a comment - I at least failed to quickly understand 
> it. How can we be unlinking entries, if they have already been unlinked?

(2) Because we did list_add not list_move, see (1).

> > +                     if (&ce->signals == &rq->signal_link)
> > +                             list_del_init(&ce->signal_link);
> 
> This is another list_empty hack like from another day? Please put a 
> comment if you don't want it to be self documenting.

> > -static int intel_breadcrumbs_signaler(void *arg)
> > +bool intel_engine_enable_signaling(struct i915_request *rq)
> 
> intel_request_enable_signaling?

I'm warming to it.

> > +     spin_lock(&b->irq_lock);
> > +     if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
> 
> __test_bit?

Heh. test_bit == __test_bit :)

In general though, we have to be cautious as we don't own the whole flags
field.

> > +             list_for_each_prev(pos, &ce->signals) {
> > +                     struct i915_request *it =
> > +                             list_entry(pos, typeof(*it), signal_link);
> >   
> > -                     if (unlikely(kthread_should_stop()))
> > +                     if (i915_seqno_passed(rq->fence.seqno, 
> 
> Put a comment against this loop please saying where in the list it is 
> looking to insert...

Oh you haven't written this variant of insertion sort for vblank
handling 20 times. I swear I end up repeating my mistakes over and over
again at every level in the stack.

> > +             list_add(&rq->signal_link, pos);
> > +             if (pos == &ce->signals)
> > +                     list_move_tail(&ce->signal_link, &b->signalers);
> 
> ... and here how it manages the other list as well, on transition from 
> empty to active.

Seems like the code was easy enough to follow ;)

> > +     spin_lock(&b->irq_lock);
> > +     if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
> 
> __test_and_clear_bit ?

Yeah this is where we need to be careful with rq->fence.flags. It is not
specified that __test_and_clear_bit only operates on the single bit and
so to be cautious we do the locked instruction to clobbering other
atomic instructions to the rest of the field.

Although you can make a very strong case that all fence->flags are
serialised for signaling today.

> > -             spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
> > -             struct rb_root waiters; /* sorted by retirement, priority */
> > -             struct list_head signals; /* sorted by retirement */
> > -             struct task_struct *signaler; /* used for fence signalling */
> > +             struct irq_work irq_work;
> 
> Why did you need irq work and not just invoke the handler directly? 
> Maybe put a comment here giving a hint.

/* lock inversion horrors */

Due to the way we may directly submit requests on handling the
dma_fence_signal, we can end up processing a
i915_request_enable_signaling on the same engine as is currently
emitting the signal.

> > +static int __igt_breadcrumbs_smoketest(void *arg)
> > +{
> > +     struct smoketest *t = arg;
> > +     struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
> 
> Breaking new ground, well okay, although caching dev or i915 would be 
> good enough.
> 
> > +     struct i915_request **requests;
> > +     I915_RND_STATE(prng);
> > +     const unsigned int total = 4 * t->ncontexts + 1;
> > +     const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
> > +     unsigned int num_waits = 0, num_fences = 0;
> > +     unsigned int *order;
> > +     int err = 0;
> 
> Still in the Chrismas spirit? ;) No worries, it's selftests.

I was feeling generous in replacing the elaborate breadcrumb testing we
had with something at all!

That testing was the best part of intel_breadcrumbs.

> I ran out of steam and will look at selftests during some second pass. 
> In expectation, please put some high level comments for each test to 
> roughly say what it plans to test and with what approach. I makes 
> reverse engineering the algorithm much easier.

There's only one test (just run with mock_request and i915_request), a
very, very, very simple smoketest.

I did not come up with ways of testing the new signal_list to the same
rigour as we did before. :(
-Chris