[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

Dave Gordon david.s.gordon at intel.com
Thu Aug 18 15:54:25 UTC 2016


On 18/08/16 16:36, Dave Gordon wrote:
> On 18/08/16 16:27, Dave Gordon wrote:
>
> [snip]
>
>> Note that SKL GuC firmware 6.1 didn't support dual submission or lite
>> restore, whereas the next version (8.11) does. Therefore, with that
>> firmware we don't see the same slowdown when going to 1-at-a-time
>> round-robin. I have a different (new) test that shows this more clearly.
>
> This is with GuC version 6.1:
>
> skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
>
> Time to exec 8-byte batch:      3.428µs (ring=render)
> Time to exec 8-byte batch:      2.444µs (ring=bsd)
> Time to exec 8-byte batch:      2.394µs (ring=blt)
> Time to exec 8-byte batch:      2.615µs (ring=vebox)
> Time to exec 8-byte batch:      2.625µs (ring=all, sequential)
> Time to exec 8-byte batch:     12.701µs (ring=all, parallel/1) ***
> Time to exec 8-byte batch:      7.259µs (ring=all, parallel/2)
> Time to exec 8-byte batch:      4.336µs (ring=all, parallel/4)
> Time to exec 8-byte batch:      2.937µs (ring=all, parallel/8)
> Time to exec 8-byte batch:      2.661µs (ring=all, parallel/16)
> Time to exec 8-byte batch:      2.245µs (ring=all, parallel/32)
> Time to exec 8-byte batch:      1.626µs (ring=all, parallel/64)
> Time to exec 8-byte batch:      2.170µs (ring=all, parallel/128)
> Time to exec 8-byte batch:      1.804µs (ring=all, parallel/256)
> Time to exec 8-byte batch:      2.602µs (ring=all, parallel/512)
> Time to exec 8-byte batch:      2.602µs (ring=all, parallel/1024)
> Time to exec 8-byte batch:      2.607µs (ring=all, parallel/2048)

And for comparison, here are the figures with v8.11:

# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS

Time to exec 8-byte batch:	  3.458µs (ring=render)
Time to exec 8-byte batch:	  2.154µs (ring=bsd)
Time to exec 8-byte batch:	  2.156µs (ring=blt)
Time to exec 8-byte batch:	  2.156µs (ring=vebox)
Time to exec 8-byte batch:	  2.388µs (ring=all, sequential)
Time to exec 8-byte batch:	  5.897µs (ring=all, parallel/1)
Time to exec 8-byte batch:	  4.669µs (ring=all, parallel/2)
Time to exec 8-byte batch:	  4.278µs (ring=all, parallel/4)
Time to exec 8-byte batch:	  2.410µs (ring=all, parallel/8)
Time to exec 8-byte batch:	  2.165µs (ring=all, parallel/16)
Time to exec 8-byte batch:	  2.158µs (ring=all, parallel/32)
Time to exec 8-byte batch:	  1.594µs (ring=all, parallel/64)
Time to exec 8-byte batch:	  1.583µs (ring=all, parallel/128)
Time to exec 8-byte batch:	  2.473µs (ring=all, parallel/256)
Time to exec 8-byte batch:	  2.264µs (ring=all, parallel/512)
Time to exec 8-byte batch:	  2.357µs (ring=all, parallel/1024)
Time to exec 8-byte batch:	  2.382µs (ring=all, parallel/2048)

All generally slightly faster, but parallel/1 is approximately twice as 
fast, while parallel/64 is virtually unchanged, as are all the timings 
for large batches.

.Dave.


More information about the Intel-gfx mailing list