[Intel-gfx] [PATCH 10/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+

Thu Jan 31 16:20:37 UTC 2019

Quoting Chris Wilson (2019-01-31 13:39:50)
> Quoting Tvrtko Ursulin (2019-01-31 13:19:31)
> > 
> > On 30/01/2019 02:19, Chris Wilson wrote:
> > > Having introduced per-context seqno, we now have a means to identity
> > > progress across the system without feel of rollback as befell the
> > > global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
> > > advance of submission safe in the knowledge that our target seqno and
> > > address is stable.
> > > 
> > > However, since we are telling the GPU to busy-spin on the target address
> > > until it matches the signaling seqno, we only want to do so when we are
> > > sure that busy-spin will be completed quickly. To achieve this we only
> > > submit the request to HW once the signaler is itself executing (modulo
> > > preemption causing us to wait longer), and we only do so for default and
> > > above priority requests (so that idle priority tasks never themselves
> > > hog the GPU waiting for others).
> > 
> > It could be milliseconds though. I think apart from media-bench saying 
> > this is faster, we would need to look at performance per Watt as well.
> 
> All throughput measurements are substantially faster, as you would
> expect, and inter-engine latency decreased. I would hope it would
> powergate/rc6 the EU while the CS was spinning, but I don't know :)

Fwiw, it's about the power cost of simply spinning with the CS without
any additional cost of utilizing the engine.
-Chris