[Intel-gfx] [PATCH 2/2] igt/gem_exec_nop: clarify & extend output from parallel execution test

Mon Aug 22 14:42:18 UTC 2016

On 22/08/2016 15:39, John Harrison wrote:
> On 03/08/2016 16:36, Dave Gordon wrote:
>> To make sense of the output of the parallel execution test (preferably
>> without reading the source!), we need to see the various measurements
>> that it makes, specifically: time/batch on each engine separately, total
>> time across all engines sequentially, and the time/batch when the work
>> is distributed over all engines in parallel.
>>
>> Since we know the per-batch time on the slowest engine (which will
>> determine the minimum possible execution time of any equal-split
>> parallel test), we can also calculate a new figure representing the
>> degree to which work on the faster engines is overlapped with that on
>> the slowest engine, and therefore does not contribute to the total time.
>> Here we choose to present it as a percentage, with parallel-time==serial
>> time giving 0% overlap, up to parallel-time==slowest-engine-
>> time/n_engines being 100%. Note that negative values are possible;
>> values greater than 100% may also be possible, although less likely.
>>
>> Signed-off-by: Dave Gordon<david.s.gordon at intel.com>
>> ---
>>   tests/gem_exec_nop.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
>> index c2bd472..05aa383 100644
>> --- a/tests/gem_exec_nop.c
>> +++ b/tests/gem_exec_nop.c
>> @@ -137,7 +137,9 @@ static void all(int fd, uint32_t handle, int timeout)
>>   		if (ignore_engine(fd, engine))
>>   			continue;
>>   
>> -		time = nop_on_ring(fd, handle, engine, 1, &count) / count;
>> +		time = nop_on_ring(fd, handle, engine, 2, &count) / count;
>> +		igt_info("%s: %'lu cycles: %.3fus/batch\n",
>> +			 e__->name, count, time*1e6);
>>   		if (time > max) {
>>   			name = e__->name;
>>   			max = time;
>> @@ -148,8 +150,9 @@ static void all(int fd, uint32_t handle, int timeout)
>>   		engines[nengine++] = engine;
>>   	}
>>   	igt_require(nengine);
>> -	igt_info("Maximum execution latency on %s, %.3fus, total %.3fus per cycle\n",
>> -		 name, max*1e6, sum*1e6);
>> +	igt_info("Slowest engine was %s, %.3fus/batch\n", name, max*1e6);
>> +	igt_info("Total for all %d engines is %.3fus per cycle, average %.3fus/batch\n",
>> +		 nengine, sum*1e6, sum*1e6/nengine);
>>   
>>   	memset(&obj, 0, sizeof(obj));
>>   	obj.handle = handle;
>> @@ -187,8 +190,10 @@ static void all(int fd, uint32_t handle, int timeout)
>>   	igt_assert_eq(intel_detect_and_clear_missed_interrupts(fd), 0);
>>   
>>   	time = elapsed(&start, &now) / count;
>> -	igt_info("All (%d engines): %'lu cycles, average %.3fus per cycle\n",
>> -		 nengine, count, 1e6*time);
>> +	igt_info("All %d engines (parallel/%d): %'lu cycles, "
>> +		 "average %.3fus/batch, overlap %.1f%\n",
PS: As mentioned in person, the above format string is should end with 
'%%\n' not '%\n'. But with that fixed, review-by as below.

>> +		 nengine, BURST, count,
>> +		 1e6*time, 100*(sum-time)/(sum-(max/nengine)));
>>   
>>   	/* The rate limiting step is how fast the slowest engine can
>>   	 * its queue of requests, if we wait upon a full ring all dispatch
>
> I'm not entirely convinced about the overlap calculation. The other 
> info is definitely useful though.
>
> Reviewed-by: John Harrison <john.c.harrison at intel.com>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20160822/28c5e312/attachment-0001.html>