[Intel-gfx] [PATCH igt] igt/perf_pmu: Bump batch_duration for legacy sampling inaccuracy

Thu Nov 23 07:40:01 UTC 2017

On 23/11/2017 07:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-11-23 07:14:13)
>>
>> On 23/11/2017 00:08, Chris Wilson wrote:
>>> Since the legacy ringbuffer uses a sampling technique, it is limited to
>>> an accuracy based on a 200Hz timer, or 5ms. We assert that measurements
>>> are within 5%, so with a 100ms duration that gives us no room for the
>>> systemmatic error in our sampling. Bump the duration to 500ms to give us
>>> plenty of safety margin, if it then fails, it should not be due to the
>>> sampling.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    tests/perf_pmu.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>>> index 61da224e..50ca7895 100644
>>> --- a/tests/perf_pmu.c
>>> +++ b/tests/perf_pmu.c
>>> @@ -44,7 +44,7 @@
>>>    IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
>>>    
>>>    const double tolerance = 0.05f;
>>> -const unsigned long batch_duration_ns = 100e6;
>>> +const unsigned long batch_duration_ns = 500e6;
>>>    
>>>    static int open_pmu(uint64_t config)
>>>    {
>>>
>>
>> Hm, it is definitely too short in sampling mode as you describe in the
>> commit.
>>
>> I am only a bit unhappy that 5x increase makes the total test run much
>> longer. Embedding knowledge in the test on what counters are sampling
>> and what not would be too bad?
>>
>> Or perhaps a compromise on those by extending the batch duration a
>> little bit less, and increasing the tolerance a bit?
> 
> My rough estimate with the current tolerance we need a minimum of 300ms
> batch to hide the sampling inaccuracy (liberal use of Nyquist plus error
> accumulation). 500ms then to give enough slack to be sure it's not a
> systematic error from sampling.
> 
> Increasing tolerance is a bit harder to sell, I think. You do want some
> notion of accuracy and 5% is a "happy" value.
> 
>> That would mean adding variables like sampling_batch_duration_ns and
>> sampling_tolerance and busyness based tests would also pick based on gen.
>>
>> If you would be happy with that I'll implement it.
> 
> You want something more complicated go for it. Personally, even with .5s
> batch duration total runtime wasn't an issue for me. (It's the pauses on
> frequency, interrupts  and rc6 that start to get me worried!)
> 
> Total runtime with .5s is just under 40s.

You are right, it has a much smaller effect than I assumed.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko