[igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement

Thu Feb 1 16:39:47 UTC 2018

Quoting Tvrtko Ursulin (2018-02-01 16:26:58)
> 
> On 01/02/2018 12:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> In cases where we manually terminate the busy batch, we always want to
> >> sample busyness while the batch is running, just before we will
> >> terminate it, and not the other way around. This way we make the window
> >> for unwated idleness getting sampled smaller.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> ---
> >>   tests/perf_pmu.c | 28 +++++++++++++---------------
> >>   1 file changed, 13 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> >> index 2f7d33414a53..bf16e5e8b1f9 100644
> >> --- a/tests/perf_pmu.c
> >> +++ b/tests/perf_pmu.c
> >> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
> >>                  spin = NULL;
> >>   
> >>          slept = measured_usleep(batch_duration_ns / 1000);
> >> -       igt_spin_batch_end(spin);
> >> -
> >>          val = pmu_read_single(fd);
> >>   
> >> +       igt_spin_batch_end(spin);
> > 
> > But that's the wrong way round as we are measuring busyness, and the
> > sampling should terminate as soon as the spin-batch ends, before we even
> > read the PMU sample? For the timer sampler, it's lost in the noise.
> > 
> > So the idea was to cancel the busyness asap so that the sampler stops
> > updating before we even have to cross into the kernel for the PMU read.
> 
> I don't follow on the problem statement. This is how code used to looks 
> in many places:
> 
>         slept = measured_usleep(batch_duration_ns / 1000);
>         igt_spin_batch_end(spin);
> 
>         pmu_read_multi(fd[0], num_engines, val);
> 
> Problem here is there is indeterministic time gap, depending on test 
> execution speed, between requesting the batch to end to reading the 
> counter. This can add a random amount of idleness to the read value.

But we are not measuring idleness, but busyness. The batch will end a
few tens of nanoseconds after the write hits memory. The desire is that
time is zero so that the sleep exactly corresponds with the interval the
batch was spinning (busy).

> And another indeterminism in how long it takes for the batch end request 
> to get picked up by the GPU. If that is slower in some cases, the 
> counter will drift from the expected value toward other direction, 
> overestimating busyness relative to sleep duration.
> 
> Attempted improvement was simply to reverse the last two lines, so we 
> read the counter when we know it is busy (after the sleep), and then 
> request batch termination.
> 
> This only leaves the scheduling delay between end of sleep and counter 
> read, which is smaller than end of sleep - batch end - counter read.
> 
> These tests are not testing for edge conditions, just that the busy 
> engines are reported as busy, in various combinations, so that sounded 
> like a reasonable change.
> 
> I hope I did not get confused here, it wouldn't be the first time in 
> these tests...

Aiui, the PMU measure busyness which is the duration of the spin-batch
(or first enabling of the PMU event). The choice is between a context
switch into the kernel to stop the counter, or a single write to memory
to stop the batch).

My bet is the write to memory will turn off the counter within 100ns,
worst case including the interrupt processing. The PMU event stopping
I estimate 100ns best case (including the kernel context switch, and
don't ask how KPTI perturbs that switch as I don't know off hand how
much worse it will get).

Now, the write to memory is asynchronous to the PMU event stopping. So
if you write first, whichever is processed first (the kernel context
switch + event stop, or the CS interrupt) causes the busy counter to
cease. So best of both worlds?

I too may be wrong...
-Chris