[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Tue Mar 1 10:32:46 UTC 2016

On Tue, Mar 01, 2016 at 10:21:45AM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 29/02/16 11:59, Tvrtko Ursulin wrote:
> >
> >
> >On 29/02/16 11:48, Chris Wilson wrote:
> >>On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
> >>>
> >>>
> >>>On 29/02/16 11:13, Chris Wilson wrote:
> >>>>On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
> >>>>>
> >>>>>On 29/02/16 10:53, Chris Wilson wrote:
> >>>>>>On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
> >>>>>>>This ok?
> >>>>>>>
> >>>>>>>"""
> >>>>>>>One unexplained result is with "gem_latency -n 0" (dispatching
> >>>>>>>empty batches) which shows 5% more throughput, 8% less CPU time,
> >>>>>>>25% better producer and consumer latencies, but 15% higher
> >>>>>>>dispatch latency which looks like a possible measuring artifact.
> >>>>>>>"""
> >>>>>>
> >>>>>>I doubt it is a measuring artefact since throughput = 1/(dispatch +
> >>>>>>latency + test overhead), and the dispatch latency here is larger
> >>>>>>than
> >>>>>>the wakeup latency and so has greater impact on throughput in this
> >>>>>>scenario.
> >>>>>
> >>>>>I don't follow you, if dispatch latency has larger effect on
> >>>>>throughput how to explain the increase and still better throughput?
> >>>>>
> >>>>>I see in gem_latency this block:
> >>>>>
> >>>>>    measure_latency(p, &p->latency);
> >>>>>    igt_stats_push(&p->dispatch, *p->last_timestamp - start);
> >>>>>
> >>>>>measure_latency waits for the batch to complete and then dispatch
> >>>>>latency uses p->last_timestamp which is something written by the GPU
> >>>>>and not a CPU view of the latency ?
> >>>>
> >>>>Exactly, measurements are entirely made from the running engine clock
> >>>>(which is ~80ns clock, and should be verified during init). The
> >>>>register
> >>>>is read before dispatch, inside the batch and then at wakeup, but the
> >>>>information is presented as dispatch = batch - start and
> >>>>wakeup = end - batch, so to get the duration (end - start) we need
> >>>>to add the two together. Throughput will also include some overhead
> >>>>from
> >>>>the test iteration (that will mainly be scheduler interference).
> >>>>
> >>>>My comment about dispatch having greater effect, is in terms of
> >>>>its higher absolute value (so the relative % means a larger change wrt
> >>>>throughput).
> >>>
> >>>Change to this then?
> >>>
> >>>"""
> >>>     One unexplained result is with "gem_latency -n 0" (dispatching
> >>>     empty batches) which shows 5% more throughput, 8% less CPU time,
> >>>     25% better producer and consumer latencies, but 15% higher
> >>>     dispatch latency which looks like an amplified effect of test
> >>>     overhead.
> >>>"""
> >>
> >>No. Dispatch latency is important and this attempts to pass the change
> >>off a test effect when to the best of my knowledge it is a valid external
> >>observation of the system.
> >
> >I just don't understand how can it be valid when we have executed more
> >empty batches than before in a unit of time.
> >
> >Because even dispatch + wake up latency is worse, but throughput is
> >still better.
> >
> >Sounds impossible to me so it must be the effect of using two different
> >time sources. CPU side to measure throughput and GPU side to measure
> >dispatch latency.I don't know, could you suggest a paragraph to add to
> >the commit message so we can close on this?
> 
> Happy with simply leaving out any attempts of explaining the oddity like:
> 
> """
> One odd result is with "gem_latency -n 0" (dispatching empty
> batches) which shows 5% more throughput, 8% less CPU time, 25%
> better producer and consumer latencies, but 15% higher dispatch
> latency which is yet unexplained.
> """

Yes!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre