[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Mar 1 10:41:26 UTC 2016


On 01/03/16 10:32, Chris Wilson wrote:
> On Tue, Mar 01, 2016 at 10:21:45AM +0000, Tvrtko Ursulin wrote:
>>
>>
>> On 29/02/16 11:59, Tvrtko Ursulin wrote:
>>>
>>>
>>> On 29/02/16 11:48, Chris Wilson wrote:
>>>> On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
>>>>>
>>>>>
>>>>> On 29/02/16 11:13, Chris Wilson wrote:
>>>>>> On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 29/02/16 10:53, Chris Wilson wrote:
>>>>>>>> On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
>>>>>>>>> This ok?
>>>>>>>>>
>>>>>>>>> """
>>>>>>>>> One unexplained result is with "gem_latency -n 0" (dispatching
>>>>>>>>> empty batches) which shows 5% more throughput, 8% less CPU time,
>>>>>>>>> 25% better producer and consumer latencies, but 15% higher
>>>>>>>>> dispatch latency which looks like a possible measuring artifact.
>>>>>>>>> """
>>>>>>>>
>>>>>>>> I doubt it is a measuring artefact since throughput = 1/(dispatch +
>>>>>>>> latency + test overhead), and the dispatch latency here is larger
>>>>>>>> than
>>>>>>>> the wakeup latency and so has greater impact on throughput in this
>>>>>>>> scenario.
>>>>>>>
>>>>>>> I don't follow you, if dispatch latency has larger effect on
>>>>>>> throughput how to explain the increase and still better throughput?
>>>>>>>
>>>>>>> I see in gem_latency this block:
>>>>>>>
>>>>>>>     measure_latency(p, &p->latency);
>>>>>>>     igt_stats_push(&p->dispatch, *p->last_timestamp - start);
>>>>>>>
>>>>>>> measure_latency waits for the batch to complete and then dispatch
>>>>>>> latency uses p->last_timestamp which is something written by the GPU
>>>>>>> and not a CPU view of the latency ?
>>>>>>
>>>>>> Exactly, measurements are entirely made from the running engine clock
>>>>>> (which is ~80ns clock, and should be verified during init). The
>>>>>> register
>>>>>> is read before dispatch, inside the batch and then at wakeup, but the
>>>>>> information is presented as dispatch = batch - start and
>>>>>> wakeup = end - batch, so to get the duration (end - start) we need
>>>>>> to add the two together. Throughput will also include some overhead
>>>>>> from
>>>>>> the test iteration (that will mainly be scheduler interference).
>>>>>>
>>>>>> My comment about dispatch having greater effect, is in terms of
>>>>>> its higher absolute value (so the relative % means a larger change wrt
>>>>>> throughput).
>>>>>
>>>>> Change to this then?
>>>>>
>>>>> """
>>>>>      One unexplained result is with "gem_latency -n 0" (dispatching
>>>>>      empty batches) which shows 5% more throughput, 8% less CPU time,
>>>>>      25% better producer and consumer latencies, but 15% higher
>>>>>      dispatch latency which looks like an amplified effect of test
>>>>>      overhead.
>>>>> """
>>>>
>>>> No. Dispatch latency is important and this attempts to pass the change
>>>> off a test effect when to the best of my knowledge it is a valid external
>>>> observation of the system.
>>>
>>> I just don't understand how can it be valid when we have executed more
>>> empty batches than before in a unit of time.
>>>
>>> Because even dispatch + wake up latency is worse, but throughput is
>>> still better.
>>>
>>> Sounds impossible to me so it must be the effect of using two different
>>> time sources. CPU side to measure throughput and GPU side to measure
>>> dispatch latency.I don't know, could you suggest a paragraph to add to
>>> the commit message so we can close on this?
>>
>> Happy with simply leaving out any attempts of explaining the oddity like:
>>
>> """
>> One odd result is with "gem_latency -n 0" (dispatching empty
>> batches) which shows 5% more throughput, 8% less CPU time, 25%
>> better producer and consumer latencies, but 15% higher dispatch
>> latency which is yet unexplained.
>> """
>
> Yes!

Thanks! Patch merged.

I'll try CSB read outside the execlists lock to see if that helps any 
(into a temporary buffer).

What about your patch to move it all to a bottom handler? Are we going 
to progress that one?

Regards,

Tvrtko


More information about the Intel-gfx mailing list