[Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jul 15 12:26:23 UTC 2020
On 15/07/2020 13:06, Tvrtko Ursulin wrote:
>
> On 15/07/2020 11:50, Chris Wilson wrote:
>> Currently, we use i915_request_completed() directly in
>> i915_request_wait() and follow up with a manual invocation of
>> dma_fence_signal(). This appears to cause a large number of contentions
>> on i915_request.lock as when the process is woken up after the fence is
>> signaled by an interrupt, we will then try and call dma_fence_signal()
>> ourselves while the signaler is still holding the lock.
>> dma_fence_is_signaled() has the benefit of checking the
>> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
>> avoids most of that contention.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>> 1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c
>> b/drivers/gpu/drm/i915/i915_request.c
>> index 0b2fe55e6194..bb4eb1a8780e 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout,
>> unsigned int cpu)
>> return this_cpu != cpu;
>> }
>> -static bool __i915_spin_request(const struct i915_request * const rq,
>> int state)
>> +static bool __i915_spin_request(struct i915_request * const rq, int
>> state)
>> {
>> unsigned long timeout_ns;
>> unsigned int cpu;
>> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct
>> i915_request * const rq, int state)
>> timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>> timeout_ns += local_clock_ns(&cpu);
>> do {
>> - if (i915_request_completed(rq))
>> + if (dma_fence_is_signaled(&rq->fence))
>> return true;
>> if (signal_pending_state(state, current))
>> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>> * duration, which we currently lack.
>> */
>> if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
>> - __i915_spin_request(rq, state)) {
>> - dma_fence_signal(&rq->fence);
>> + __i915_spin_request(rq, state))
>> goto out;
>> - }
>> /*
>> * This client is about to stall waiting for the GPU. In many cases
>> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>> for (;;) {
>> set_current_state(state);
>> - if (i915_request_completed(rq)) {
>> - dma_fence_signal(&rq->fence);
>> + if (dma_fence_is_signaled(&rq->fence))
>> break;
>> - }
>> intel_engine_flush_submission(rq->engine);
>>
>
> In other words putting some latency back into the waiters, which is
> probably okay, since sync waits is not our primary model.
>
> I have a slight concern about the remaining value of busy spinning if
> i915_request_completed check is removed from there as well. Of course it
> doesn't make sense to have different completion criteria between the
> two.. We could wait a bit longer if real check in busyspin said request
> is actually completed, just not signal it but wait for the breadcrumbs
> to do it.
What a load of nonsense.. :)
Okay, I think the only real question is i915_request_completed vs
dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles
waiting on GPU and breadcrumb irq work, or just the GPU.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list