[Intel-gfx] [PATCH 2/2] igt/gem_exec_nop: clarify & extend output from parallel execution test

Mon Aug 22 14:39:17 UTC 2016

On 03/08/2016 16:36, Dave Gordon wrote:
> To make sense of the output of the parallel execution test (preferably
> without reading the source!), we need to see the various measurements
> that it makes, specifically: time/batch on each engine separately, total
> time across all engines sequentially, and the time/batch when the work
> is distributed over all engines in parallel.
>
> Since we know the per-batch time on the slowest engine (which will
> determine the minimum possible execution time of any equal-split
> parallel test), we can also calculate a new figure representing the
> degree to which work on the faster engines is overlapped with that on
> the slowest engine, and therefore does not contribute to the total time.
> Here we choose to present it as a percentage, with parallel-time==serial
> time giving 0% overlap, up to parallel-time==slowest-engine-
> time/n_engines being 100%. Note that negative values are possible;
> values greater than 100% may also be possible, although less likely.
>
> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
> ---
>   tests/gem_exec_nop.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
> index c2bd472..05aa383 100644
> --- a/tests/gem_exec_nop.c
> +++ b/tests/gem_exec_nop.c
> @@ -137,7 +137,9 @@ static void all(int fd, uint32_t handle, int timeout)
>   		if (ignore_engine(fd, engine))
>   			continue;
>   
> -		time = nop_on_ring(fd, handle, engine, 1, &count) / count;
> +		time = nop_on_ring(fd, handle, engine, 2, &count) / count;
> +		igt_info("%s: %'lu cycles: %.3fus/batch\n",
> +			 e__->name, count, time*1e6);
>   		if (time > max) {
>   			name = e__->name;
>   			max = time;
> @@ -148,8 +150,9 @@ static void all(int fd, uint32_t handle, int timeout)
>   		engines[nengine++] = engine;
>   	}
>   	igt_require(nengine);
> -	igt_info("Maximum execution latency on %s, %.3fus, total %.3fus per cycle\n",
> -		 name, max*1e6, sum*1e6);
> +	igt_info("Slowest engine was %s, %.3fus/batch\n", name, max*1e6);
> +	igt_info("Total for all %d engines is %.3fus per cycle, average %.3fus/batch\n",
> +		 nengine, sum*1e6, sum*1e6/nengine);
>   
>   	memset(&obj, 0, sizeof(obj));
>   	obj.handle = handle;
> @@ -187,8 +190,10 @@ static void all(int fd, uint32_t handle, int timeout)
>   	igt_assert_eq(intel_detect_and_clear_missed_interrupts(fd), 0);
>   
>   	time = elapsed(&start, &now) / count;
> -	igt_info("All (%d engines): %'lu cycles, average %.3fus per cycle\n",
> -		 nengine, count, 1e6*time);
> +	igt_info("All %d engines (parallel/%d): %'lu cycles, "
> +		 "average %.3fus/batch, overlap %.1f%\n",
> +		 nengine, BURST, count,
> +		 1e6*time, 100*(sum-time)/(sum-(max/nengine)));
>   
>   	/* The rate limiting step is how fast the slowest engine can
>   	 * its queue of requests, if we wait upon a full ring all dispatch

I'm not entirely convinced about the overlap calculation. The other info 
is definitely useful though.

Reviewed-by: John Harrison <john.c.harrison at intel.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20160822/01085026/attachment.html>