[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Tue Mar 1 10:21:45 UTC 2016

On 29/02/16 11:59, Tvrtko Ursulin wrote:
>
>
> On 29/02/16 11:48, Chris Wilson wrote:
>> On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
>>>
>>>
>>> On 29/02/16 11:13, Chris Wilson wrote:
>>>> On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 29/02/16 10:53, Chris Wilson wrote:
>>>>>> On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
>>>>>>> This ok?
>>>>>>>
>>>>>>> """
>>>>>>> One unexplained result is with "gem_latency -n 0" (dispatching
>>>>>>> empty batches) which shows 5% more throughput, 8% less CPU time,
>>>>>>> 25% better producer and consumer latencies, but 15% higher
>>>>>>> dispatch latency which looks like a possible measuring artifact.
>>>>>>> """
>>>>>>
>>>>>> I doubt it is a measuring artefact since throughput = 1/(dispatch +
>>>>>> latency + test overhead), and the dispatch latency here is larger
>>>>>> than
>>>>>> the wakeup latency and so has greater impact on throughput in this
>>>>>> scenario.
>>>>>
>>>>> I don't follow you, if dispatch latency has larger effect on
>>>>> throughput how to explain the increase and still better throughput?
>>>>>
>>>>> I see in gem_latency this block:
>>>>>
>>>>>     measure_latency(p, &p->latency);
>>>>>     igt_stats_push(&p->dispatch, *p->last_timestamp - start);
>>>>>
>>>>> measure_latency waits for the batch to complete and then dispatch
>>>>> latency uses p->last_timestamp which is something written by the GPU
>>>>> and not a CPU view of the latency ?
>>>>
>>>> Exactly, measurements are entirely made from the running engine clock
>>>> (which is ~80ns clock, and should be verified during init). The
>>>> register
>>>> is read before dispatch, inside the batch and then at wakeup, but the
>>>> information is presented as dispatch = batch - start and
>>>> wakeup = end - batch, so to get the duration (end - start) we need
>>>> to add the two together. Throughput will also include some overhead
>>>> from
>>>> the test iteration (that will mainly be scheduler interference).
>>>>
>>>> My comment about dispatch having greater effect, is in terms of
>>>> its higher absolute value (so the relative % means a larger change wrt
>>>> throughput).
>>>
>>> Change to this then?
>>>
>>> """
>>>      One unexplained result is with "gem_latency -n 0" (dispatching
>>>      empty batches) which shows 5% more throughput, 8% less CPU time,
>>>      25% better producer and consumer latencies, but 15% higher
>>>      dispatch latency which looks like an amplified effect of test
>>>      overhead.
>>> """
>>
>> No. Dispatch latency is important and this attempts to pass the change
>> off a test effect when to the best of my knowledge it is a valid external
>> observation of the system.
>
> I just don't understand how can it be valid when we have executed more
> empty batches than before in a unit of time.
>
> Because even dispatch + wake up latency is worse, but throughput is
> still better.
>
> Sounds impossible to me so it must be the effect of using two different
> time sources. CPU side to measure throughput and GPU side to measure
> dispatch latency.I don't know, could you suggest a paragraph to add to
> the commit message so we can close on this?

Happy with simply leaving out any attempts of explaining the oddity like:

"""
One odd result is with "gem_latency -n 0" (dispatching empty batches) 
which shows 5% more throughput, 8% less CPU time, 25% better producer 
and consumer latencies, but 15% higher dispatch latency which is yet 
unexplained.
"""

?

Regards,

Tvrtko