[Intel-gfx] [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

Wed Aug 3 15:36:46 UTC 2016

The parallel execution test in gem_exec_nop chooses a pessimal
distribution of work to multiple engines; specifically, it
round-robins one batch to each engine in turn. As the workloads
are trivial (NOPs), this results in each engine becoming idle
between batches. Hence parallel submission is seen to take LONGER
than the same number of batches executed sequentially.

If on the other hand we send enough work to each engine to keep
it busy until the next time we add to its queue, (i.e. round-robin
some larger number of batches to each engine in turn) then we can
get true parallel execution and should find that it is FASTER than
sequential execuion.

By experiment, burst sizes of between 8 and 256 are sufficient to
keep multiple engines loaded, with the optimum (for this trivial
workload) being around 64. This is expected to be lower (possibly
as low as one) for more realistic (heavier) workloads.

Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
---
 tests/gem_exec_nop.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
index 9b89260..c2bd472 100644
--- a/tests/gem_exec_nop.c
+++ b/tests/gem_exec_nop.c
@@ -166,14 +166,17 @@ static void all(int fd, uint32_t handle, int timeout)
 	gem_sync(fd, handle);
 	intel_detect_and_clear_missed_interrupts(fd);
 
+#define	BURST	64
+
 	count = 0;
 	clock_gettime(CLOCK_MONOTONIC, &start);
 	do {
-		for (int loop = 0; loop < 1024; loop++) {
+		for (int loop = 0; loop < 1024/BURST; loop++) {
 			for (int n = 0; n < nengine; n++) {
 				execbuf.flags &= ~ENGINE_FLAGS;
 				execbuf.flags |= engines[n];
-				gem_execbuf(fd, &execbuf);
+				for (int b = 0; b < BURST; ++b)
+					gem_execbuf(fd, &execbuf);
 			}
 		}
 		count += nengine * 1024;
-- 
1.9.1