[Intel-gfx] [PATCH 26/34] drm/i915: Identify active requests

Tue Jan 22 15:45:18 UTC 2019

Quoting Tvrtko Ursulin (2019-01-22 15:34:07)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > To allow requests to forgo a common execution timeline, one question we
> > need to be able to answer is "is this request running?". To track
> > whether a request has started on HW, we can emit a breadcrumb at the
> > beginning of the request and check its timeline's HWSP to see if the
> > breadcrumb has advanced past the start of this request. (This is in
> > contrast to the global timeline where we need only ask if we are on the
> > global timeline and if the timeline has advanced past the end of the
> > previous request.)
> > 
> > There is still confusion from a preempted request, which has already
> > started but relinquished the HW to a high priority request. For the
> > common case, this discrepancy should be negligible. However, for
> > identification of hung requests, knowing which one was running at the
> > time of the hang will be much more important.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  6 +++
> >   drivers/gpu/drm/i915/i915_request.c          |  9 ++--
> >   drivers/gpu/drm/i915/i915_request.h          |  1 +
> >   drivers/gpu/drm/i915/i915_timeline.c         |  1 +
> >   drivers/gpu/drm/i915/i915_timeline.h         |  2 +
> >   drivers/gpu/drm/i915/intel_engine_cs.c       |  4 +-
> >   drivers/gpu/drm/i915/intel_lrc.c             | 47 ++++++++++++++++----
> >   drivers/gpu/drm/i915/intel_ringbuffer.c      | 43 ++++++++++--------
> >   drivers/gpu/drm/i915/intel_ringbuffer.h      |  6 ++-
> >   drivers/gpu/drm/i915/selftests/mock_engine.c |  2 +-
> >   10 files changed, 86 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index f250109e1f66..defe7d60bb88 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1976,6 +1976,12 @@ static int eb_submit(struct i915_execbuffer *eb)
> >                       return err;
> >       }
> >   
> > +     if (eb->engine->emit_init_breadcrumb) {
> > +             err = eb->engine->emit_init_breadcrumb(eb->request);
> > +             if (err)
> > +                     return err;
> > +     }
> > +
> >       err = eb->engine->emit_bb_start(eb->request,
> >                                       eb->batch->node.start +
> >                                       eb->batch_start_offset,
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index bb2885f1dc1e..0a8a2a1bf55d 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -333,6 +333,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> >   
> >   static u32 timeline_get_seqno(struct i915_timeline *tl)
> >   {
> > +     tl->seqno += tl->has_initial_breadcrumb;
> >       return ++tl->seqno;
> 
> return tl->seqno += 1 + tl->has_initial_breadcrumb?
> 
> Not sure if it would make any difference in the code.

Identical code generation, but looks better than conditional increment
then pre-increment.

> > @@ -382,8 +383,8 @@ void __i915_request_submit(struct i915_request *request)
> >               intel_engine_enable_signaling(request, false);
> >       spin_unlock(&request->lock);
> >   
> > -     engine->emit_breadcrumb(request,
> > -                             request->ring->vaddr + request->postfix);
> > +     engine->emit_fini_breadcrumb(request,
> > +                                  request->ring->vaddr + request->postfix);
> >   
> >       /* Transfer from per-context onto the global per-engine timeline */
> >       move_to_timeline(request, &engine->timeline);
> > @@ -657,7 +658,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >        * around inside i915_request_add() there is sufficient space at
> >        * the beginning of the ring as well.
> >        */
> > -     rq->reserved_space = 2 * engine->emit_breadcrumb_sz * sizeof(u32);
> > +     rq->reserved_space = 2 * engine->emit_fini_breadcrumb_sz * sizeof(u32);
> 
> Logic being fini breadcrumb is at least as big as the init one? I can't 
> think of any easy asserts to verify that.

We emit engine->emit_init_breadcrumbs() normally, it's just
engine->emit_fini_breadcrumbs() that is in the reserved portion.

The factor of 2 is to waste space on wraparound.

> Also, a little bit of ring space wastage but I guess we don't care.

We don't actually waste space, we only use emit_fini_breadcrumbs_sz, we
just flush enough of the ring for 2*sz to be sure that even if we have
to wrap, there's enough room at the start of the ring for our emit.

So overzealous on flushing if the ring is full, in which case we are
throttling a millisecond earlier than is strictly required (given that
the ring already contains about a few seconds worth of batches)

The real problem here is that throttling one client, strangles them all.
-Chris