[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Mon Feb 29 11:48:02 UTC 2016

On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 29/02/16 11:13, Chris Wilson wrote:
> >On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 29/02/16 10:53, Chris Wilson wrote:
> >>>On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
> >>>>This ok?
> >>>>
> >>>>"""
> >>>>One unexplained result is with "gem_latency -n 0" (dispatching
> >>>>empty batches) which shows 5% more throughput, 8% less CPU time,
> >>>>25% better producer and consumer latencies, but 15% higher
> >>>>dispatch latency which looks like a possible measuring artifact.
> >>>>"""
> >>>
> >>>I doubt it is a measuring artefact since throughput = 1/(dispatch +
> >>>latency + test overhead), and the dispatch latency here is larger than
> >>>the wakeup latency and so has greater impact on throughput in this
> >>>scenario.
> >>
> >>I don't follow you, if dispatch latency has larger effect on
> >>throughput how to explain the increase and still better throughput?
> >>
> >>I see in gem_latency this block:
> >>
> >>	measure_latency(p, &p->latency);
> >>	igt_stats_push(&p->dispatch, *p->last_timestamp - start);
> >>
> >>measure_latency waits for the batch to complete and then dispatch
> >>latency uses p->last_timestamp which is something written by the GPU
> >>and not a CPU view of the latency ?
> >
> >Exactly, measurements are entirely made from the running engine clock
> >(which is ~80ns clock, and should be verified during init). The register
> >is read before dispatch, inside the batch and then at wakeup, but the
> >information is presented as dispatch = batch - start and
> >wakeup = end - batch, so to get the duration (end - start) we need
> >to add the two together. Throughput will also include some overhead from
> >the test iteration (that will mainly be scheduler interference).
> >
> >My comment about dispatch having greater effect, is in terms of
> >its higher absolute value (so the relative % means a larger change wrt
> >throughput).
> 
> Change to this then?
> 
> """
>     One unexplained result is with "gem_latency -n 0" (dispatching
>     empty batches) which shows 5% more throughput, 8% less CPU time,
>     25% better producer and consumer latencies, but 15% higher
>     dispatch latency which looks like an amplified effect of test
>     overhead.
> """

No. Dispatch latency is important and this attempts to pass the change
off a test effect when to the best of my knowledge it is a valid external
observation of the system.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre