[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

John Harrison John.C.Harrison at Intel.com
Thu Aug 18 12:01:25 UTC 2016


On 03/08/2016 17:05, Dave Gordon wrote:
> On 03/08/16 16:45, Chris Wilson wrote:
>> On Wed, Aug 03, 2016 at 04:36:46PM +0100, Dave Gordon wrote:
>>> The parallel execution test in gem_exec_nop chooses a pessimal
>>> distribution of work to multiple engines; specifically, it
>>> round-robins one batch to each engine in turn. As the workloads
>>> are trivial (NOPs), this results in each engine becoming idle
>>> between batches. Hence parallel submission is seen to take LONGER
>>> than the same number of batches executed sequentially.
>>>
>>> If on the other hand we send enough work to each engine to keep
>>> it busy until the next time we add to its queue, (i.e. round-robin
>>> some larger number of batches to each engine in turn) then we can
>>> get true parallel execution and should find that it is FASTER than
>>> sequential execuion.
>>>
>>> By experiment, burst sizes of between 8 and 256 are sufficient to
>>> keep multiple engines loaded, with the optimum (for this trivial
>>> workload) being around 64. This is expected to be lower (possibly
>>> as low as one) for more realistic (heavier) workloads.
>>
>> Quite funny. The driver submission overhead of A...A vs ABAB... engines
>> is nearly identical, at least as far as the analysis presented here.
>> -Chris
>
> Correct; but because the workloads are so trivial, if we hand out jobs 
> one at a time to each engine, the first will have finished the one 
> batch it's been given before we get round to giving at a second one 
> (even in execlist mode). If there are N engines, submitting a single 
> batch takes S seconds, and the workload takes W seconds to execute, 
> then if W < N*S the engine will be idle between batches. For example, 
> if N is 4, W is 2us, and S is 1us, then the engine will be idle some 
> 50% of the time.
>
> This wouldn't be an issue for more realistic workloads, where W >> S.
> It only looks problematic because of the trivial nature of the work.

Can you post the numbers that you get?

I seem to get massive variability on my BDW. The render ring always 
gives me around 2.9us/batch but the other rings sometimes give me region 
of 1.2us and sometimes 7-8us.


>
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



More information about the Intel-gfx mailing list