[Intel-gfx] [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations

Tue Apr 14 07:00:24 PDT 2015

On Tue, Apr 14, 2015 at 02:51:37PM +0100, Tvrtko Ursulin wrote:
> 
> On 04/07/2015 04:20 PM, Chris Wilson wrote:
> >Currently, we only track the last request globally across all engines.
> >This prevents us from issuing concurrent read requests on e.g. the RCS
> >and BCS engines (or more likely the render and media engines). Without
> >semaphores, we incur costly stalls as we synchronise between rings -
> >greatly impacting the current performance of Broadwell versus Haswell in
> >certain workloads (like video decode). With the introduction of
> >reference counted requests, it is much easier to track the last request
> >per ring, as well as the last global write request so that we can
> >optimise inter-engine read read requests (as well as better optimise
> >certain CPU waits).
> >
> >v2: Fix inverted readonly condition for nonblocking waits.
> >v3: Handle non-continguous engine array after waits
> >v4: Rebase, tidy, rewrite ring list debugging
> >v5: Use obj->active as a bitfield, it looks cool
> >v6: Micro-optimise, mostly involving moving code around
> >v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
> >v8: Rebase
> 
> I am still slightly concerned with the sequential ring req waiting
> in combination with optimistic spinning, but other than that looks
> good to me:

I hear you, I don't yet have a scenario where I care but with a little
more refactoring (see next version) extending i915_wait_request to work
on an array of requests will be a reasonalbly easy task.

> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Thanks, but I have a new version on its way with minor changes.

Spotted an issue with Ironlake and do_idling() as well as slight
refactoring.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre