[igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement

Thu Feb 1 16:26:58 UTC 2018

On 01/02/2018 12:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> In cases where we manually terminate the busy batch, we always want to
>> sample busyness while the batch is running, just before we will
>> terminate it, and not the other way around. This way we make the window
>> for unwated idleness getting sampled smaller.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>>   tests/perf_pmu.c | 28 +++++++++++++---------------
>>   1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index 2f7d33414a53..bf16e5e8b1f9 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>>                  spin = NULL;
>>   
>>          slept = measured_usleep(batch_duration_ns / 1000);
>> -       igt_spin_batch_end(spin);
>> -
>>          val = pmu_read_single(fd);
>>   
>> +       igt_spin_batch_end(spin);
> 
> But that's the wrong way round as we are measuring busyness, and the
> sampling should terminate as soon as the spin-batch ends, before we even
> read the PMU sample? For the timer sampler, it's lost in the noise.
> 
> So the idea was to cancel the busyness asap so that the sampler stops
> updating before we even have to cross into the kernel for the PMU read.

I don't follow on the problem statement. This is how code used to looks 
in many places:

  	slept = measured_usleep(batch_duration_ns / 1000);
	igt_spin_batch_end(spin);

  	pmu_read_multi(fd[0], num_engines, val);

Problem here is there is indeterministic time gap, depending on test 
execution speed, between requesting the batch to end to reading the 
counter. This can add a random amount of idleness to the read value.

And another indeterminism in how long it takes for the batch end request 
to get picked up by the GPU. If that is slower in some cases, the 
counter will drift from the expected value toward other direction, 
overestimating busyness relative to sleep duration.

Attempted improvement was simply to reverse the last two lines, so we 
read the counter when we know it is busy (after the sleep), and then 
request batch termination.

This only leaves the scheduling delay between end of sleep and counter 
read, which is smaller than end of sleep - batch end - counter read.

These tests are not testing for edge conditions, just that the busy 
engines are reported as busy, in various combinations, so that sounded 
like a reasonable change.

I hope I did not get confused here, it wouldn't be the first time in 
these tests...

Regards,

Tvrtko