[Intel-gfx] [PATCH 13/66] drm/i915/gem: Don't drop the timeline lock during execbuf

Tue Jul 28 14:51:50 UTC 2020

Quoting Thomas Hellström (Intel) (2020-07-23 17:09:15)
> 
> On 2020-07-15 13:50, Chris Wilson wrote:
> > Our timeline lock is our defence against a concurrent execbuf
> > interrupting our request construction. we need hold it throughout or,
> > for example, a second thread may interject a relocation request in
> > between our own relocation request and execution in the ring.
> >
> > A second, major benefit, is that it allows us to preserve a large chunk
> > of the ringbuffer for our exclusive use; which should virtually
> > eliminate the threat of hitting a wait_for_space during request
> > construction -- although we should have already dropped other
> > contentious locks at that point.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 413 +++++++++++-------
> >   .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 +-
> >   2 files changed, 281 insertions(+), 156 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 719ba9fe3e85..af3499aafd22 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -259,6 +259,8 @@ struct i915_execbuffer {
> >               bool has_fence : 1;
> >               bool needs_unfenced : 1;
> >   
> > +             struct intel_context *ce;
> > +
> >               struct i915_vma *target;
> >               struct i915_request *rq;
> >               struct i915_vma *rq_vma;
> > @@ -639,6 +641,35 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb,
> >       return 0;
> >   }
> >   
> > +static void retire_requests(struct intel_timeline *tl)
> > +{
> > +     struct i915_request *rq, *rn;
> > +
> > +     list_for_each_entry_safe(rq, rn, &tl->requests, link)
> > +             if (!i915_request_retire(rq))
> > +                     break;
> > +}
> > +
> > +static int wait_for_timeline(struct intel_timeline *tl)
> > +{
> > +     do {
> > +             struct dma_fence *fence;
> > +             int err;
> > +
> > +             fence = i915_active_fence_get(&tl->last_request);
> > +             if (!fence)
> > +                     return 0;
> > +
> > +             err = dma_fence_wait(fence, true);
> > +             dma_fence_put(fence);
> > +             if (err)
> > +                     return err;
> > +
> > +             /* Retiring may trigger a barrier, requiring an extra pass */
> > +             retire_requests(tl);
> > +     } while (1);
> > +}
> > +
> >   static int eb_reserve(struct i915_execbuffer *eb)
> >   {
> >       const unsigned int count = eb->buffer_count;
> > @@ -646,7 +677,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
> >       struct list_head last;
> >       struct eb_vma *ev;
> >       unsigned int i, pass;
> > -     int err = 0;
> >   
> >       /*
> >        * Attempt to pin all of the buffers into the GTT.
> > @@ -662,18 +692,37 @@ static int eb_reserve(struct i915_execbuffer *eb)
> >        * room for the earlier objects *unless* we need to defragment.
> >        */
> >   
> > -     if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
> > -             return -EINTR;
> > -
> >       pass = 0;
> >       do {
> > +             int err = 0;
> > +
> > +             /*
> > +              * We need to hold one lock as we bind all the vma so that
> > +              * we have a consistent view of the entire vm and can plan
> > +              * evictions to fill the whole GTT. If we allow a second
> > +              * thread to run as we do this, it will either unbind
> > +              * everything we want pinned, or steal space that we need for
> > +              * ourselves. The closer we are to a full GTT, the more likely
> > +              * such contention will cause us to fail to bind the workload
> > +              * for this batch. Since we know at this point we need to
> > +              * find space for new buffers, we know that extra pressure
> > +              * from contention is likely.
> > +              *
> > +              * In lieu of being able to hold vm->mutex for the entire
> > +              * sequence (it's complicated!), we opt for struct_mutex.
> > +              */
> > +             if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
> > +                     return -EINTR;
> > +
> 
> With TTM, an idea that has been around for a long time is to let the 
> reservations resolve this. I don't think that's in place yet, though, 
> due to the fact that eviction / unbinding still requires a trylock 
> reservation and also because the evictions are not batched but performed 
> one by one with the evicted objects' reservations dropped immediately 
> after eviction. Having reservations resolve this could perhaps be 
> something we could aim for in the long run as well? Unrelated batches 
> would then never contend.
> 
> In the meantime would it make sense to introduce a new device-wide mutex
> to avoid completely unrelated contention with the struct_mutex?
No.

The vma are not related to reservations.
-Chris