[igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening

Thu Feb 1 17:02:18 UTC 2018

On 01/02/2018 16:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 16:37:29)
>>
>> On 01/02/2018 12:59, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-02-01 12:47:45)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Where we use measured sleeps, take PMU samples immediately before and
>>>> after and look at their delta in order to minimize the effect of any
>>>> test setup delays.
>>>
>>> The system and pmu were meant to be idle at the start of the test,
>>> right? So val should always be zero?
>>
>> Yes, but there is a time delay between starting the counters and
>> applying busyness. For instance, busy-check-all, current version:
>>
>>          ... pmu open somewhere before ...
>>
>>          spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, val);
>>
>> In this case the slept value vs the read busyness will miss a tiny bit
>> between igt_spin_batch_new to measured_usleep. Probably minimal indeed,
>> but I thought just for extra safety to take explicit initial read just
>> before the sleep, so:
>>
>>          spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>>          pmu_read_multi(fd[0], num_engines, tval[0]);
> 
> Time gap here as well. How do we know this is better than before?

It is theoretically marginally better on the leading edge in this 
example. Is it practically significant I don't know. At least it isn't 
worse. I can compare the numbers before and after once the results get 
in. I suspect Apollo Lake will the interesting machine to look at.

>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, tval[1]);
>>
>> More importantly, it is a potentially larger time delta in tests which
>> open multiple counters after starting the spinner. Like
>> most_busy_check_all for instance:
>>
>>          ... start spin batch...
>>
>>          for (i = 0; i < num_engines; i++)
>>                  fd[i] = open_group(val[i], fd[0]);
>>
>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, val);
>>
>> So the counter value relative to slept value will include time spent
>> opening num_engines event. Once again change to take an explicit initial
>> value just before the sleep looked reasonable to me.
> 
> I was working on open being minimal delay and insignificant. I have no
> idea what the relative costs are. That would be nice to know.
> 
> The issue I have is that the scheduler can preempt us at time (so
> other than the argument one is quicker and so gives less systematic
> error in the ideal case), we are at the mercy of the scheduler which can
> inject unknown sleeps between any point. I fear we may need RT, mlocking
> and more?, but would much rather avoid it.

Yep, that was exactly my thinking. Lets avoid using big ugly hammers 
(RT) unless there is no other way. With this series applied, if there 
are still random unexplained discrepancies, I think there will be 
nothing else to try than to go RT.

> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> (in exchange for a small test to benchmark open_(single|group),
> read_(single|group), pretty please :)

Okay, marking the email as TODO.

Regards,

Tvrtko