[Intel-gfx] [PATCH 13/66] drm/i915/gem: Don't drop the timeline lock during execbuf

Tue Jul 28 14:46:27 UTC 2020

On 7/23/20 6:09 PM, Thomas Hellström (Intel) wrote:
>
> On 2020-07-15 13:50, Chris Wilson wrote:
>> Our timeline lock is our defence against a concurrent execbuf
>> interrupting our request construction. we need hold it throughout or,
>> for example, a second thread may interject a relocation request in
>> between our own relocation request and execution in the ring.
>>
>> A second, major benefit, is that it allows us to preserve a large chunk
>> of the ringbuffer for our exclusive use; which should virtually
>> eliminate the threat of hitting a wait_for_space during request
>> construction -- although we should have already dropped other
>> contentious locks at that point.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 413 +++++++++++-------
>>   .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 +-
>>   2 files changed, 281 insertions(+), 156 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 719ba9fe3e85..af3499aafd22 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -259,6 +259,8 @@ struct i915_execbuffer {
>>           bool has_fence : 1;
>>           bool needs_unfenced : 1;
>>   +        struct intel_context *ce;
>> +
>>           struct i915_vma *target;
>>           struct i915_request *rq;
>>           struct i915_vma *rq_vma;
>> @@ -639,6 +641,35 @@ static int eb_reserve_vma(const struct 
>> i915_execbuffer *eb,
>>       return 0;
>>   }
>>   +static void retire_requests(struct intel_timeline *tl)
>> +{
>> +    struct i915_request *rq, *rn;
>> +
>> +    list_for_each_entry_safe(rq, rn, &tl->requests, link)
>> +        if (!i915_request_retire(rq))
>> +            break;
>> +}
>> +
>> +static int wait_for_timeline(struct intel_timeline *tl)
>> +{
>> +    do {
>> +        struct dma_fence *fence;
>> +        int err;
>> +
>> +        fence = i915_active_fence_get(&tl->last_request);
>> +        if (!fence)
>> +            return 0;
>> +
>> +        err = dma_fence_wait(fence, true);
>> +        dma_fence_put(fence);
>> +        if (err)
>> +            return err;
>> +
>> +        /* Retiring may trigger a barrier, requiring an extra pass */
>> +        retire_requests(tl);
>> +    } while (1);
>> +}
>> +
>>   static int eb_reserve(struct i915_execbuffer *eb)
>>   {
>>       const unsigned int count = eb->buffer_count;
>> @@ -646,7 +677,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
>>       struct list_head last;
>>       struct eb_vma *ev;
>>       unsigned int i, pass;
>> -    int err = 0;
>>         /*
>>        * Attempt to pin all of the buffers into the GTT.
>> @@ -662,18 +692,37 @@ static int eb_reserve(struct i915_execbuffer *eb)
>>        * room for the earlier objects *unless* we need to defragment.
>>        */
>>   -    if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
>> -        return -EINTR;
>> -
>>       pass = 0;
>>       do {
>> +        int err = 0;
>> +
>> +        /*
>> +         * We need to hold one lock as we bind all the vma so that
>> +         * we have a consistent view of the entire vm and can plan
>> +         * evictions to fill the whole GTT. If we allow a second
>> +         * thread to run as we do this, it will either unbind
>> +         * everything we want pinned, or steal space that we need for
>> +         * ourselves. The closer we are to a full GTT, the more likely
>> +         * such contention will cause us to fail to bind the workload
>> +         * for this batch. Since we know at this point we need to
>> +         * find space for new buffers, we know that extra pressure
>> +         * from contention is likely.
>> +         *
>> +         * In lieu of being able to hold vm->mutex for the entire
>> +         * sequence (it's complicated!), we opt for struct_mutex.
>> +         */
>> +        if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
>> +            return -EINTR;
>> +
>
> With TTM, an idea that has been around for a long time is to let the 
> reservations resolve this. I don't think that's in place yet, though, 
> due to the fact that eviction / unbinding still requires a trylock 
> reservation and also because the evictions are not batched but 
> performed one by one with the evicted objects' reservations dropped 
> immediately after eviction. Having reservations resolve this could 
> perhaps be something we could aim for in the long run as well? 
> Unrelated batches would then never contend.
>
> In the meantime would it make sense to introduce a new device-wide mutex
>
> to avoid completely unrelated contention with the struct_mutex?
>
>
Actually I see this is changed later in the series...

/Thomas