[Intel-gfx] [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence

Mon Apr 24 13:19:54 UTC 2017

On Mon, Apr 24, 2017 at 02:03:25PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/04/2017 10:41, Chris Wilson wrote:
> >Track the latest fence waited upon on each context, and only add a new
> >asynchronous wait if the new fence is more recent than the recorded
> >fence for that context. This requires us to filter out unordered
> >timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the
> >absence of a universal identifier, we have to use our own
> >i915->mm.unordered_timeline token.
> 
> (._.), a bit later... @_@!
> 
> What does this fixes and is the complexity worth it?

It's a recovery of the optimisation that we used to have from the
initial multiple engine semaphore synchronisation - that of avoiding
repeating the same synchronisation barriers.

In the current setup, the cost of repeat fence synchronisation is
obfuscated, it just causes a tight loop between

 /<---------------------------------------------\
 |                                               ^
i915_sw_fence_complete -> i915_sw_fence_commit ->|

and extra depth in the dependency trees, which is generally not
observed in normal usage.

When you know what you are looking for, the reduction of all those
atomic ops from underneath hardirq is definitely worth it, even for
fairly simply operations, and there tends to be repetition from all he
buffers being tracked between requests (and clients).

Using a seqno map avoids the cost of tracking fences (i.e. keeping old
fences forever) and allows it to be kept on the timeline, rather than
the request itself (a ht under the request can squash simple repeats,
but using the timeline is more complete).

2 small routines to implement a compressed radixtree -- it's
comparitively simple compared to having to accommodate RCU walkers!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre