[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Tue Mar 1 10:32:02 UTC 2016

On Mon, Feb 29, 2016 at 11:59:26AM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 29/02/16 11:48, Chris Wilson wrote:
> >On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
> >>
> >>
> >>On 29/02/16 11:13, Chris Wilson wrote:
> >>>On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 29/02/16 10:53, Chris Wilson wrote:
> >>>>>On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
> >>>>>>This ok?
> >>>>>>
> >>>>>>"""
> >>>>>>One unexplained result is with "gem_latency -n 0" (dispatching
> >>>>>>empty batches) which shows 5% more throughput, 8% less CPU time,
> >>>>>>25% better producer and consumer latencies, but 15% higher
> >>>>>>dispatch latency which looks like a possible measuring artifact.
> >>>>>>"""
> >>>>>
> >>>>>I doubt it is a measuring artefact since throughput = 1/(dispatch +
> >>>>>latency + test overhead), and the dispatch latency here is larger than
> >>>>>the wakeup latency and so has greater impact on throughput in this
> >>>>>scenario.
> >>>>
> >>>>I don't follow you, if dispatch latency has larger effect on
> >>>>throughput how to explain the increase and still better throughput?
> >>>>
> >>>>I see in gem_latency this block:
> >>>>
> >>>>	measure_latency(p, &p->latency);
> >>>>	igt_stats_push(&p->dispatch, *p->last_timestamp - start);
> >>>>
> >>>>measure_latency waits for the batch to complete and then dispatch
> >>>>latency uses p->last_timestamp which is something written by the GPU
> >>>>and not a CPU view of the latency ?
> >>>
> >>>Exactly, measurements are entirely made from the running engine clock
> >>>(which is ~80ns clock, and should be verified during init). The register
> >>>is read before dispatch, inside the batch and then at wakeup, but the
> >>>information is presented as dispatch = batch - start and
> >>>wakeup = end - batch, so to get the duration (end - start) we need
> >>>to add the two together. Throughput will also include some overhead from
> >>>the test iteration (that will mainly be scheduler interference).
> >>>
> >>>My comment about dispatch having greater effect, is in terms of
> >>>its higher absolute value (so the relative % means a larger change wrt
> >>>throughput).
> >>
> >>Change to this then?
> >>
> >>"""
> >>     One unexplained result is with "gem_latency -n 0" (dispatching
> >>     empty batches) which shows 5% more throughput, 8% less CPU time,
> >>     25% better producer and consumer latencies, but 15% higher
> >>     dispatch latency which looks like an amplified effect of test
> >>     overhead.
> >>"""
> >
> >No. Dispatch latency is important and this attempts to pass the change
> >off a test effect when to the best of my knowledge it is a valid external
> >observation of the system.
> 
> I just don't understand how can it be valid when we have executed
> more empty batches than before in a unit of time.
> 
> Because even dispatch + wake up latency is worse, but throughput is
> still better.
> 
> Sounds impossible to me so it must be the effect of using two
> different time sources. CPU side to measure throughput and GPU side
> to measure dispatch latency.

Sure, but it is not impossible. The actual throughput (measured as
number of cycles per test run) is a product of the CPU scheduler on top
of everything else, We could measure the test overhead by measuring the
GPU cycles between batches, and the most likely explanation is likely to
be exactly who the isr is interrupting, setting maxcpus=1 would cancel
the variation should that be the case.

> I don't know, could you suggest a
> paragraph to add to the commit message so we can close on this?

On to the next...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre