[Intel-gfx] [PATCH i-g-t 2/2] tests/perf_pmu: Exercise busy stats and lite-restore

Wed Jan 10 13:45:07 UTC 2018

Quoting Tvrtko Ursulin (2018-01-10 10:37:51)
> 
> On 09/01/2018 21:39, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-09 16:16:21)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> While developing a fix for an accounting hole in busy stats we realized
> >> lite-restore is a potential edge case which would be interesting to check
> >> is properly handled.
> >>
> >> It is unfortnately quite timing sensitive to hit lite-restore in the
> >> fashion test needs, so downside of this test is that it sufferes from a
> >> high rate of false negatives.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> ---
> >>   tests/perf_pmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 85 insertions(+)
> >>
> >> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> >> index e1f449d48808..7db5059d202e 100644
> >> --- a/tests/perf_pmu.c
> >> +++ b/tests/perf_pmu.c
> >> @@ -192,6 +192,84 @@ busy_start(int gem_fd, const struct intel_execution_engine2 *e)
> >>          gem_quiescent_gpu(gem_fd);
> >>   }
> >>   
> >> +/*
> >> + * This test has a potentially low rate of catching the issue it is trying to
> >> + * catch. Or in other words, quite high rate of false negative successes. We
> >> + * will depend on the CI systems running it a lot to detect issues.
> >> + */
> >> +static void
> >> +busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
> >> +{
> >> +       unsigned long slept;
> >> +       igt_spin_t *spin[2];
> >> +       uint64_t val, val2;
> >> +       bool execlists;
> >> +       uint32_t ctx;
> >> +       int fd;
> >> +
> >> +       ctx = gem_context_create(gem_fd);
> >> +
> >> +       if (gem_has_execlists(gem_fd)) {
> >> +               /*
> >> +                * Defeat the busy stats delayed disable, we need to guarantee
> >> +                * we are the first user.
> >> +                */
> >> +               execlists = true;
> >> +               sleep(2);
> >> +       } else {
> >> +               execlists = false;
> >> +       }
> >> +
> >> +       /*
> >> +        * Submit two contexts, with a pause in between targeting the ELSP
> >> +        * re-submission in execlists mode. Make sure busyness is correctly
> >> +        * reported with the engine busy, and after the engine went idle.
> >> +        */
> >> +       spin[0] = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
> >> +       usleep(500000);
> >> +       spin[1] = __igt_spin_batch_new(gem_fd, ctx, e2ring(gem_fd, e), 0);
> > 
> > If you want to avoid the second spinner, you can resubmit the first with
> > a second context.
> 
> I know you like that but I prefer the pedestrian way. :)
> 
> >> +       /*
> >> +        * Open PMU as fast as possible after the second spin batch in attempt
> >> +        * to be faster than the driver handling lite-restore.
> >> +        */
> >> +       fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
> >> +
> >> +       slept = measured_usleep(batch_duration_ns / 1000);
> >> +       val = pmu_read_single(fd);
> >> +
> >> +       igt_spin_batch_end(spin[0]);
> >> +       igt_spin_batch_end(spin[1]);
> >> +
> >> +       gem_sync(gem_fd, spin[0]->handle);
> >> +       gem_sync(gem_fd, spin[1]->handle);
> >> +
> >> +       /* Wait for context complete to avoid accounting residual busyness. */
> >> +       if (execlists)
> >> +               usleep(100000);
> > 
> > Then you want a gem_quiescent_gpu() instead of the gem_sync()? Though
> > I'm scratching my head here as to what you mean. If there's an issue
> > here, won't we see it in our other measurements as well?
> 
> Primarily because we want to measure idle after busy here and not in 
> other tests, *and*, with execlists time between user interrupt and 
> context save counts as busyness. So without this delay it is possible to 
> observe a few micro-second of engine busyness in the below check.

Ok, so with a gem_quiescent_gpu() we shouldn't need an arbitrary sleep,
as we are stating the gpu is then idle (ELSP drained etc).

> >> +       val2 = pmu_read_single(fd);
> >> +       usleep(batch_duration_ns / 1000);
> >> +       val2 = pmu_read_single(fd) - val2;
> > 
> > But the engine is idle at this point? Why sleep for execlists and not
> > legacy? Isn't the execlists case where we know that just by waiting for
> > idle (gem_quiescent_gpu() not an arbitrary usleep) then the pmu counter
> > is idle. It's legacy where we would have to wait for an sample interval
> > following gpu idle to be sure the pmu is idle.
> 
> Hm legacy and sampling.. I think we don't have to wait since there is no 
> context save there is the idleness comes straight with the user 
> interrupt, or no?

Hmm. Right. Since the sample after completion doesn't add any time to
the counter, you only need to ensure that pmu_read_single() is after the
seqno completion (i.e. gem_sync) to assert it should increment anymore.

Ok. I think you can just remove the 

> >> +       gem_sync(gem_fd, spin[0]->handle);
> >> +       gem_sync(gem_fd, spin[1]->handle);
> >> +
> >> +       /* Wait for context complete to avoid accounting residual busyness. */
> >> +       if (execlists)
> >> +               usleep(100000);

with a gem_quiescent_gpu() and everybody is happy.

With that tweak,
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
though I'd be happy if the initial sleep(2) was unconditional (pending
suitable information exported from the kernel).
-Chris