[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test
Dave Gordon
david.s.gordon at intel.com
Thu Aug 18 15:36:11 UTC 2016
On 18/08/16 16:27, Dave Gordon wrote:
[snip]
> Note that SKL GuC firmware 6.1 didn't support dual submission or lite
> restore, whereas the next version (8.11) does. Therefore, with that
> firmware we don't see the same slowdown when going to 1-at-a-time
> round-robin. I have a different (new) test that shows this more clearly.
This is with GuC version 6.1:
skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
Time to exec 8-byte batch: 3.428µs (ring=render)
Time to exec 8-byte batch: 2.444µs (ring=bsd)
Time to exec 8-byte batch: 2.394µs (ring=blt)
Time to exec 8-byte batch: 2.615µs (ring=vebox)
Time to exec 8-byte batch: 2.625µs (ring=all, sequential)
Time to exec 8-byte batch: 12.701µs (ring=all, parallel/1) ***
Time to exec 8-byte batch: 7.259µs (ring=all, parallel/2)
Time to exec 8-byte batch: 4.336µs (ring=all, parallel/4)
Time to exec 8-byte batch: 2.937µs (ring=all, parallel/8)
Time to exec 8-byte batch: 2.661µs (ring=all, parallel/16)
Time to exec 8-byte batch: 2.245µs (ring=all, parallel/32)
Time to exec 8-byte batch: 1.626µs (ring=all, parallel/64)
Time to exec 8-byte batch: 2.170µs (ring=all, parallel/128)
Time to exec 8-byte batch: 1.804µs (ring=all, parallel/256)
Time to exec 8-byte batch: 2.602µs (ring=all, parallel/512)
Time to exec 8-byte batch: 2.602µs (ring=all, parallel/1024)
Time to exec 8-byte batch: 2.607µs (ring=all, parallel/2048)
Time to exec 4Kbyte batch: 14.835µs (ring=render)
Time to exec 4Kbyte batch: 11.787µs (ring=bsd)
Time to exec 4Kbyte batch: 11.533µs (ring=blt)
Time to exec 4Kbyte batch: 11.991µs (ring=vebox)
Time to exec 4Kbyte batch: 12.444µs (ring=all, sequential)
Time to exec 4Kbyte batch: 16.211µs (ring=all, parallel/1)
Time to exec 4Kbyte batch: 13.943µs (ring=all, parallel/2)
Time to exec 4Kbyte batch: 13.878µs (ring=all, parallel/4)
Time to exec 4Kbyte batch: 13.841µs (ring=all, parallel/8)
Time to exec 4Kbyte batch: 14.188µs (ring=all, parallel/16)
Time to exec 4Kbyte batch: 13.747µs (ring=all, parallel/32)
Time to exec 4Kbyte batch: 13.734µs (ring=all, parallel/64)
Time to exec 4Kbyte batch: 13.727µs (ring=all, parallel/128)
Time to exec 4Kbyte batch: 13.947µs (ring=all, parallel/256)
Time to exec 4Kbyte batch: 12.230µs (ring=all, parallel/512)
Time to exec 4Kbyte batch: 12.147µs (ring=all, parallel/1024)
Time to exec 4Kbyte batch: 12.617µs (ring=all, parallel/2048)
What this shows is that the submission overhead is ~3us which is
comparable with the execution time of a trivial (8-byte) batch, but
insignificant compared with the time to execute the 4Kbyte batch. The
burst size therefore makes very little difference to the larger batches.
.Dave.
More information about the Intel-gfx
mailing list