[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test
Dave Gordon
david.s.gordon at intel.com
Thu Aug 18 15:54:25 UTC 2016
On 18/08/16 16:36, Dave Gordon wrote:
> On 18/08/16 16:27, Dave Gordon wrote:
>
> [snip]
>
>> Note that SKL GuC firmware 6.1 didn't support dual submission or lite
>> restore, whereas the next version (8.11) does. Therefore, with that
>> firmware we don't see the same slowdown when going to 1-at-a-time
>> round-robin. I have a different (new) test that shows this more clearly.
>
> This is with GuC version 6.1:
>
> skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
>
> Time to exec 8-byte batch: 3.428µs (ring=render)
> Time to exec 8-byte batch: 2.444µs (ring=bsd)
> Time to exec 8-byte batch: 2.394µs (ring=blt)
> Time to exec 8-byte batch: 2.615µs (ring=vebox)
> Time to exec 8-byte batch: 2.625µs (ring=all, sequential)
> Time to exec 8-byte batch: 12.701µs (ring=all, parallel/1) ***
> Time to exec 8-byte batch: 7.259µs (ring=all, parallel/2)
> Time to exec 8-byte batch: 4.336µs (ring=all, parallel/4)
> Time to exec 8-byte batch: 2.937µs (ring=all, parallel/8)
> Time to exec 8-byte batch: 2.661µs (ring=all, parallel/16)
> Time to exec 8-byte batch: 2.245µs (ring=all, parallel/32)
> Time to exec 8-byte batch: 1.626µs (ring=all, parallel/64)
> Time to exec 8-byte batch: 2.170µs (ring=all, parallel/128)
> Time to exec 8-byte batch: 1.804µs (ring=all, parallel/256)
> Time to exec 8-byte batch: 2.602µs (ring=all, parallel/512)
> Time to exec 8-byte batch: 2.602µs (ring=all, parallel/1024)
> Time to exec 8-byte batch: 2.607µs (ring=all, parallel/2048)
And for comparison, here are the figures with v8.11:
# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
Time to exec 8-byte batch: 3.458µs (ring=render)
Time to exec 8-byte batch: 2.154µs (ring=bsd)
Time to exec 8-byte batch: 2.156µs (ring=blt)
Time to exec 8-byte batch: 2.156µs (ring=vebox)
Time to exec 8-byte batch: 2.388µs (ring=all, sequential)
Time to exec 8-byte batch: 5.897µs (ring=all, parallel/1)
Time to exec 8-byte batch: 4.669µs (ring=all, parallel/2)
Time to exec 8-byte batch: 4.278µs (ring=all, parallel/4)
Time to exec 8-byte batch: 2.410µs (ring=all, parallel/8)
Time to exec 8-byte batch: 2.165µs (ring=all, parallel/16)
Time to exec 8-byte batch: 2.158µs (ring=all, parallel/32)
Time to exec 8-byte batch: 1.594µs (ring=all, parallel/64)
Time to exec 8-byte batch: 1.583µs (ring=all, parallel/128)
Time to exec 8-byte batch: 2.473µs (ring=all, parallel/256)
Time to exec 8-byte batch: 2.264µs (ring=all, parallel/512)
Time to exec 8-byte batch: 2.357µs (ring=all, parallel/1024)
Time to exec 8-byte batch: 2.382µs (ring=all, parallel/2048)
All generally slightly faster, but parallel/1 is approximately twice as
fast, while parallel/64 is virtually unchanged, as are all the timings
for large batches.
.Dave.
More information about the Intel-gfx
mailing list