[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

Dave Gordon david.s.gordon at intel.com
Thu Aug 18 15:36:11 UTC 2016


On 18/08/16 16:27, Dave Gordon wrote:

[snip]

> Note that SKL GuC firmware 6.1 didn't support dual submission or lite
> restore, whereas the next version (8.11) does. Therefore, with that
> firmware we don't see the same slowdown when going to 1-at-a-time
> round-robin. I have a different (new) test that shows this more clearly.

This is with GuC version 6.1:

skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS

Time to exec 8-byte batch:	  3.428µs (ring=render)
Time to exec 8-byte batch:	  2.444µs (ring=bsd)
Time to exec 8-byte batch:	  2.394µs (ring=blt)
Time to exec 8-byte batch:	  2.615µs (ring=vebox)
Time to exec 8-byte batch:	  2.625µs (ring=all, sequential)
Time to exec 8-byte batch:	 12.701µs (ring=all, parallel/1) ***
Time to exec 8-byte batch:	  7.259µs (ring=all, parallel/2)
Time to exec 8-byte batch:	  4.336µs (ring=all, parallel/4)
Time to exec 8-byte batch:	  2.937µs (ring=all, parallel/8)
Time to exec 8-byte batch:	  2.661µs (ring=all, parallel/16)
Time to exec 8-byte batch:	  2.245µs (ring=all, parallel/32)
Time to exec 8-byte batch:	  1.626µs (ring=all, parallel/64)
Time to exec 8-byte batch:	  2.170µs (ring=all, parallel/128)
Time to exec 8-byte batch:	  1.804µs (ring=all, parallel/256)
Time to exec 8-byte batch:	  2.602µs (ring=all, parallel/512)
Time to exec 8-byte batch:	  2.602µs (ring=all, parallel/1024)
Time to exec 8-byte batch:	  2.607µs (ring=all, parallel/2048)

Time to exec 4Kbyte batch:	 14.835µs (ring=render)
Time to exec 4Kbyte batch:	 11.787µs (ring=bsd)
Time to exec 4Kbyte batch:	 11.533µs (ring=blt)
Time to exec 4Kbyte batch:	 11.991µs (ring=vebox)
Time to exec 4Kbyte batch:	 12.444µs (ring=all, sequential)
Time to exec 4Kbyte batch:	 16.211µs (ring=all, parallel/1)
Time to exec 4Kbyte batch:	 13.943µs (ring=all, parallel/2)
Time to exec 4Kbyte batch:	 13.878µs (ring=all, parallel/4)
Time to exec 4Kbyte batch:	 13.841µs (ring=all, parallel/8)
Time to exec 4Kbyte batch:	 14.188µs (ring=all, parallel/16)
Time to exec 4Kbyte batch:	 13.747µs (ring=all, parallel/32)
Time to exec 4Kbyte batch:	 13.734µs (ring=all, parallel/64)
Time to exec 4Kbyte batch:	 13.727µs (ring=all, parallel/128)
Time to exec 4Kbyte batch:	 13.947µs (ring=all, parallel/256)
Time to exec 4Kbyte batch:	 12.230µs (ring=all, parallel/512)
Time to exec 4Kbyte batch:	 12.147µs (ring=all, parallel/1024)
Time to exec 4Kbyte batch:	 12.617µs (ring=all, parallel/2048)

What this shows is that the submission overhead is ~3us which is 
comparable with the execution time of a trivial (8-byte) batch, but 
insignificant compared with the time to execute the 4Kbyte batch. The 
burst size therefore makes very little difference to the larger batches.

.Dave.


More information about the Intel-gfx mailing list