[igt-dev] [PATCH i-g-t] igt: Add gem_ctx_freq to exercise requesting freq on a ctx

Wed Apr 11 11:12:31 UTC 2018

Quoting Tvrtko Ursulin (2018-04-11 11:42:12)
> 
> On 10/04/2018 21:53, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-10 17:17:13)
> >>
> >> On 16/03/2018 11:06, Arkadiusz Hiler wrote:
> >>> From: Chris Wilson <chris at chris-wilson.co.uk>
> >>>
> >>> Exercise some new API that allows applications to request that
> >>> individual contexts are executed within a desired frequency range.
> 
> [snip]
> 
> >>> +static void inflight(int fd, const struct intel_execution_engine *e)
> >>> +{
> >>> +     const unsigned int engine = e->exec_id | e->flags;
> >>> +     uint32_t ctx, min, max, freq, discard;
> >>> +     double measured;
> >>> +     igt_spin_t *plug, *work[2];
> >>> +     int pmu;
> >>> +
> >>> +     pmu = perf_i915_open(I915_PMU_REQUESTED_FREQUENCY);
> >>> +     igt_require(pmu >= 0);
> >>> +
> >>> +     ctx = gem_context_create(fd);
> >>> +     get_freq(fd, ctx, &min, &max);
> >>> +     set_freq(fd, ctx, min, min);
> >>> +
> >>> +     igt_info("Min freq: %dMHz; Max freq: %dMHz\n", min, max);
> >>> +
> >>> +     gem_quiescent_gpu(fd);
> >>> +     plug = igt_spin_batch_new(fd, ctx, engine, 0);
> >>> +     gem_context_destroy(fd, ctx);
> >>> +     for (int n = 0; n < 16; n++) {
> >>> +             struct drm_i915_gem_exec_object2 obj = {
> >>> +                     .handle = plug->handle,
> >>> +             };
> >>> +             struct drm_i915_gem_execbuffer2 eb = {
> >>> +                     .buffer_count = 1,
> >>> +                     .buffers_ptr = to_user_pointer(&obj),
> >>> +                     .flags = engine,
> >>> +                     .rsvd1 = gem_context_create(fd),
> >>> +             };
> >>> +             set_freq(fd, eb.rsvd1, min, min);
> >>> +             gem_execbuf(fd, &eb);
> >>> +             gem_context_destroy(fd, eb.rsvd1);
> >>> +     }
> >>> +     measured = measure_frequency(pmu, SAMPLE_PERIOD);
> >>> +     igt_debugfs_dump(fd, "i915_rps_boost_info");
> >>> +     igt_info("%s(plug): Measured %.1fMHz, expected %dMhz\n",
> >>> +              e->name, measured, min);
> >>> +     pmu_assert(measured, min);
> >>
> >> I would expect queued contexts would have different limits so here you
> >> can check that the limit from the currently executing one remains in
> >> force? Like it is I am not sure what it's testing?
> > 
> > The challenge here is to detect that setting a frequency is done on a
> > lite-restore, i.e. that the updated context frequency is used for the
> > next batch even though the context is currently live.
> 
> I don't follow. Here you have ctx0 with freq set to min/min, then you 
> queue ctx1..16 all with freq set to min/min, and assert min is measured. 
> I don't see lite-restore or still the point of the test. If ctx1..16 
> would have freq set to something else you could assert the currently 
> running ctx did not have it's limit overwritten.

This is just prechecks, the test is below.

> >>> +     work[0] = __igt_spin_batch_new(fd, ctx, engine, 0);
> >>> +
> >>> +     /* work is now queued but not executing */
> >>> +     freq = (max + min) / 2;
> >>> +     set_freq(fd, ctx, freq, freq);
> >>> +     get_freq(fd, ctx, &freq, &discard);
> >>> +     gem_context_destroy(fd, ctx);
> >>> +
> >>> +     ctx = gem_context_create(fd);
> >>> +     set_freq(fd, ctx, max, max);
> >>> +     work[1] = __igt_spin_batch_new(fd, ctx, engine, 0);
> >>> +     gem_context_destroy(fd, ctx);
> >>> +
> >>> +     igt_spin_batch_end(plug);
> >>> +     do
> >>> +             usleep(10000);
> >>> +     while (gem_bo_busy(fd, plug->handle));
> >>
> >> gem_sync wouldn't cut it? Something internal could fool us here as the
> >> queued up batches proceed to start and complete?
> > 
> > gem_sync -> waitboost. I was writing this to avoid triggering
> > waitboosts, and also to cancel out unwanted boosts between loops.
> 
> Okay, so helper with a comment would be good.
> 
> >>
> >>> +     igt_spin_batch_free(fd, plug);
> >>> +
> >>> +     /* Now work will execute */
> >>> +     measured = measure_frequency(pmu, SAMPLE_PERIOD);
> >>> +     igt_debugfs_dump(fd, "i915_engine_info");
> >>
> >> Need this?
> > 
> > Need it? Have you tried debugging it without it? :)
> 
> Is it a debugging leftover is where I was trying to get at. :)

Leftover wrt to debugging failures? No! It is "I need this information
to debug, so if CI either trips up, make sure we have it".

> >>> +static void sysfs_clamp(int fd, const struct intel_execution_engine *e)
> >>> +{
> >>> +#define N_STEPS 10
> >>> +     const unsigned int engine = e->exec_id | e->flags;
> >>> +     uint32_t ctx = gem_context_create(fd);
> >>> +     uint32_t sys_min, sys_max;
> >>> +     uint32_t min, max;
> >>> +     double measured;
> >>> +     igt_spin_t *spin;
> >>> +     int pmu;
> >>> +
> >>> +     get_sysfs_freq(&sys_min, &sys_max);
> >>> +     igt_info("System min freq: %dMHz; max freq: %dMHz\n", sys_min, sys_max);
> >>> +
> >>> +     get_freq(fd, ctx, &min, &max);
> >>> +     igt_info("Context min freq: %dMHz; max freq: %dMHz\n", min, max);
> >>
> >> Assert it matches sysfs values?
> > 
> > It doesn't. The context itself starts at hw limits, as the sysfs limits
> > may change over time; and the whole shebang of limits is applied at the
> > end to decide who wins.
> 
> But at this point how can a pristine context have different limits and 
> system global?

How do you the external state is pristine? More to the point, how does
the test know that the sysfs limits won't interfere...

There is no tie between context range and sysfs range. It just happens
that the limits on both are given by the HW.

> >>> +                     int ifrac = inner > N_STEPS ? 2*N_STEPS - inner : inner;
> >>> +                     uint32_t ifreq = min + (max - min) * ifrac / N_STEPS;
> >>> +
> >>> +                     set_freq(fd, ctx, ifreq, ifreq);
> >>> +
> >>> +                     gem_quiescent_gpu(fd);
> >>> +                     spin = __igt_spin_batch_new(fd, ctx, engine, 0);
> >>> +                     usleep(10000);
> >>> +
> >>> +                     set_sysfs_freq(ofreq, ofreq);
> >>> +                     get_sysfs_freq(&cur, &discard);
> >>
> >> Read-back ctx freq and check is correct?
> > 
> > Define correct. We spent many test cycles to verify these are accurate
> > at the points we can decide they are. Duplicating that fuzzy logic here,
> > I did not want to do as it is not the test we are focusing on.
> > 
> > It just happens that those basic API tests are after the behaviour
> > tests. (Boring tests came after the tests to check the code was
> > behaving.)
> 
> I was assuming that setting sysfs limits to something other than what 
> the context has set, the reading back the context limits, would read 
> back the updated values if they are more restrictive.
> 
> Maybe it is not the case?

No. That is tricky user ABI (no idempotence at all, rather the current
rounding error).

ctx sets A, but reads back X, Y, Z depending on the phase of the moon.
User sets second context derived from first, sets any of X, Y, Z and
fails to get A.

> So ctx freq cannot be set outside the _current_ global limits, but 
> global limits can be changed afterwards which query on ctx will not 
> reflect? Probably so, because otherwise you would have to walk all 
> contexts on sysfs writes.

ctx freq cannot be set outside of HW limits, but the RPS frequency
cannot be adjusted to outside of sysfs limits.

> >>> +                     measured = measure_frequency(pmu, SAMPLE_PERIOD);
> >>> +                     igt_debugfs_dump(fd, "i915_rps_boost_info");
> >>> +
> >>> +                     set_sysfs_freq(sys_min, sys_max);
> >>
> >> Just so we do not exit with random min/max set?
> > 
> > Yes, as much as possible try to ensure we only assert and end the
> > subtest on known state.
> 
> There is an exit handler already but it doesn't harm I guess.

Exit handlers are not run between subtests. (igt is not a good test
runner /rant)

> >>> +             range[0] = min + (rand() % (max - min) / 2);
> >>> +             range[1] = max - (rand() % (max - min) / 2);
> >>> +
> >>> +             set_freq(fd, ctx, range[0], range[1]);
> >>> +             get_freq(fd, ctx, &range[0], &range[1]);
> >>> +
> >>> +             spin = __igt_spin_batch_new(fd, ctx, engine, 0);
> >>> +
> >>> +             usleep(10000);
> >>> +             measured = measure_frequency(pmu, SAMPLE_PERIOD);
> >>> +             igt_spin_batch_free(fd, spin);
> >>> +
> >>> +             igt_assert(measured >= range[0] - PMU_TOLERANCE &&
> >>> +                        measured <= range[1] + PMU_TOLERANCE);
> >>> +     }
> >>> +
> >>> +     gem_context_destroy(fd, ctx);
> >>> +     close(pmu);
> >>> +}
> >>> +
> >>> +static void sandwich(int fd, int timeout)
> >>> +{
> >>> +     unsigned int engine;
> >>> +
> >>> +     for_each_physical_engine(fd, engine) {
> >>> +             igt_fork(child, 1)
> >>> +                     sandwich_engine(fd, engine, timeout);
> >>> +     }
> >>
> >> Hmm.. so engines run in parallel - but how come all can assert they got
> >> exactly the limits they wanted? Surely someone must get a different
> >> range depending on what's running on another engine?
> > 
> > That's the setup with the overlapping frequency ranges, it ensures that
> > we can satisfy all the requests simultaneously. If the ranges are
> > disjoint, then at least one engine's request cannot be satisfied.
> 
> Okay, but if one engine wants 200-800, and another 600-1000, how can 
> both assert that what they measure is in the range of what they asked 
> for? What will actually be the resulting range here? 200-1000 or 
> 600-800? Hm I guess if 600-800 the assert works.

The resultant HW range will be [600-800] which intersects both engines,
and therefore both will be happy that the frequency is within their
target parameters.

If one engine requested [200, 200] and the other [1000, 1000], then one
will be disappointed. Hence we make sure that the ranges overlap for all
the requests such that the result is always intersects each.

Now what happens for the disjoint set of requests, I'm not going to tie
down as I think that is outside of the uABI of a single context. It is,
however, exercised by selftests.

> >>> +     for (unsigned int n = 0; n < nengine; n++)
> >>> +             ctx[n] = gem_context_create(fd);
> >>> +
> >>> +     do {
> >>> +             igt_spin_t *spin;
> >>> +             struct {
> >>> +                     uint32_t engine;
> >>> +                     uint32_t min;
> >>> +                     uint32_t max;
> >>> +             } req;
> >>> +
> >>> +             while (read(link, &req, sizeof(req)) > 0) {
> >>> +                     if ((req.engine | req.min | req.max) == 0)
> >>> +                             goto out;
> >>> +
> >>> +                     igt_assert(req.engine < nengine);
> >>> +                     set_freq(fd, ctx[req.engine], req.min, req.max);
> >>> +             }
> >>> +
> >>> +             /* Create a 20% load using busy spinners */
> >>> +             spin = __igt_spin_batch_new(fd, ctx[0], engines[0], 0);
> >>> +             for (unsigned int n = 1; n < nengine; n++) {
> >>> +                     struct drm_i915_gem_exec_object2 obj = {
> >>> +                             .handle = spin->handle,
> >>> +                     };
> >>> +                     struct drm_i915_gem_execbuffer2 eb = {
> >>> +                             .buffer_count = 1,
> >>> +                             .buffers_ptr = to_user_pointer(&obj),
> >>> +                             .flags = engines[n],
> >>> +                             .rsvd1 = ctx[n],
> >>> +                     };
> >>> +                     gem_execbuf(fd, &eb);
> >>> +             }
> >>> +             usleep(100);
> >>> +             igt_spin_batch_end(spin);
> >>> +
> >>> +             do
> >>> +                     usleep(10);
> >>> +             while (gem_bo_busy(fd, spin->handle));
> >>
> >> This loop should be moved to a helper since it is repeated a few times.
> >> (If it is required in this form.)
> >>
> >>> +             igt_spin_batch_free(fd, spin);
> >>> +             usleep(400);
> >>
> >> 100us/400us is I think too short if you want to get close to target
> >> ratio. Will see later whether you actualy depend on it.
> > 
> > The target ratio here is misleading, busyness is summed over engines, so
> > a 5x pulses have a weird effect. I've no idea what the load should look
> > like, the challenge is just to ensure that we do not have a 100% saturated
> > load and so allow the GPU to downclock. To allow downclocking we have
> > to be less than 65% busy (worst case). Other than by watching and
> > confirming we up/downclock i.e. the loading was good enough, I don't
> > think the pulses are the best method.
> 
> Why it is important to allow downclocking?

It's hardly a smoke test if on every tick we pick max. The load should
ideally oscillate enough to hit both bounds.

> >>> +     } while (1);
> >>> +
> >>> +out:
> >>> +     for (unsigned int n = 0; n < nengine; n++)
> >>> +             gem_context_destroy(fd, ctx[n]);
> >>> +}
> >>> +
> >>> +static void smoketest(int fd, int timeout)
> >>> +{
> >>> +     unsigned int engines[16];
> >>> +     unsigned int nengine;
> >>> +     unsigned int engine;
> >>> +     uint32_t min[16], max[16];
> >>> +     int pmu, link[2];
> >>> +
> >>> +     get_freq(fd, 0, &min[0], &max[0]);
> >>> +
> >>> +     nengine = 0;
> >>> +     for_each_physical_engine(fd, engine) {
> >>> +             if (nengine == ARRAY_SIZE(engines) - 1)
> >>> +                     break;
> >>> +
> >>> +             min[nengine] = min[0];
> >>> +             max[nengine] = max[0];
> >>> +             engines[nengine] = engine;
> >>> +             nengine++;
> >>> +     }
> >>> +     igt_require(nengine);
> >>> +
> >>> +     igt_assert(pipe(link) == 0);
> >>> +     igt_fork(child, 1)
> >>> +             pwm(fd, engines, nengine, link[0]);
> >>> +     close(link[0]);
> >>> +
> >>> +     pmu = perf_i915_open(I915_PMU_REQUESTED_FREQUENCY);
> >>> +     igt_require(pmu >= 0);
> >>> +
> >>> +     igt_until_timeout(timeout) {
> >>> +             struct {
> >>> +                     uint32_t engine;
> >>> +                     uint32_t min;
> >>> +                     uint32_t max;
> >>> +             } req;
> >>> +             double measured;
> >>> +             uint32_t ctx;
> >>> +
> >>> +             req.engine = rand() % nengine;
> >>> +
> >>> +             ctx = gem_context_create(fd);
> >>> +             get_freq(fd, ctx, &req.min, &req.max);
> >>> +             req.min = rand() % (req.max - req.min) + req.min;
> >>> +             req.max = rand() % (req.max - req.min) + req.min;
> >>> +             set_freq(fd, ctx, req.min, req.max);
> >>> +             get_freq(fd, ctx, &req.min, &req.max);
> >>> +
> >>> +             igt_debug("Replacing (%d, %d) on engine %x with (%d, %d)\n",
> >>> +                       min[req.engine], max[req.engine], req.engine,
> >>> +                       req.min, req.max);
> >>> +             igt_assert(write(link[1], &req, sizeof(req)) == sizeof(req));
> >>> +             gem_context_destroy(fd, ctx);
> >>> +
> >>> +             min[req.engine] = req.min;
> >>> +             max[req.engine] = req.max;
> >>> +
> >>> +             for (unsigned int n = 0; n < nengine; n++) {
> >>> +                     igt_debug("[%d]: [%d, %d]\n", n, min[n], max[n]);
> >>> +                     if (min[n] < req.min)
> >>> +                             req.min = min[n];
> >>> +                     if (max[n] > req.max)
> >>> +                             req.max = max[n];
> >>> +             }
> >>> +             igt_assert(req.max >= req.min);
> >>> +
> >>> +             usleep(50000);
> >>
> >> Have bi-directional comms so you know when child has processed the
> >> updated request?
> > 
> > Not just updated request, but the kernel rps worker to do something.
> > A bit hazy.
> 
> Ties with the previous question and needs some big fat comments at least.

But remember it is still a smoketest, an emulation of a live system to
check nothing explodes. Precision here is less paramount than a varied
workload trying to tickle the branches.

> And what I forgot to say in the first pass - some of these subtests need 
> comments to explain the general idea of what they are about to do.

/* noise(); set freq; noise(); execute; read freq under load; compare */
;)
-Chris