[Intel-gfx] [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations

Tue Apr 14 06:51:37 PDT 2015

On 04/07/2015 04:20 PM, Chris Wilson wrote:
> Currently, we only track the last request globally across all engines.
> This prevents us from issuing concurrent read requests on e.g. the RCS
> and BCS engines (or more likely the render and media engines). Without
> semaphores, we incur costly stalls as we synchronise between rings -
> greatly impacting the current performance of Broadwell versus Haswell in
> certain workloads (like video decode). With the introduction of
> reference counted requests, it is much easier to track the last request
> per ring, as well as the last global write request so that we can
> optimise inter-engine read read requests (as well as better optimise
> certain CPU waits).
>
> v2: Fix inverted readonly condition for nonblocking waits.
> v3: Handle non-continguous engine array after waits
> v4: Rebase, tidy, rewrite ring list debugging
> v5: Use obj->active as a bitfield, it looks cool
> v6: Micro-optimise, mostly involving moving code around
> v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
> v8: Rebase

I am still slightly concerned with the sequential ring req waiting in 
combination with optimistic spinning, but other than that looks good to me:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko