[igt-dev] [PATCH i-g-t] igt: Add gem_ctx_freq to exercise requesting freq on a ctx

Wed Apr 11 10:42:12 UTC 2018

On 10/04/2018 21:53, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-10 17:17:13)
>>
>> On 16/03/2018 11:06, Arkadiusz Hiler wrote:
>>> From: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> Exercise some new API that allows applications to request that
>>> individual contexts are executed within a desired frequency range.

[snip]

>>> +static void inflight(int fd, const struct intel_execution_engine *e)
>>> +{
>>> +     const unsigned int engine = e->exec_id | e->flags;
>>> +     uint32_t ctx, min, max, freq, discard;
>>> +     double measured;
>>> +     igt_spin_t *plug, *work[2];
>>> +     int pmu;
>>> +
>>> +     pmu = perf_i915_open(I915_PMU_REQUESTED_FREQUENCY);
>>> +     igt_require(pmu >= 0);
>>> +
>>> +     ctx = gem_context_create(fd);
>>> +     get_freq(fd, ctx, &min, &max);
>>> +     set_freq(fd, ctx, min, min);
>>> +
>>> +     igt_info("Min freq: %dMHz; Max freq: %dMHz\n", min, max);
>>> +
>>> +     gem_quiescent_gpu(fd);
>>> +     plug = igt_spin_batch_new(fd, ctx, engine, 0);
>>> +     gem_context_destroy(fd, ctx);
>>> +     for (int n = 0; n < 16; n++) {
>>> +             struct drm_i915_gem_exec_object2 obj = {
>>> +                     .handle = plug->handle,
>>> +             };
>>> +             struct drm_i915_gem_execbuffer2 eb = {
>>> +                     .buffer_count = 1,
>>> +                     .buffers_ptr = to_user_pointer(&obj),
>>> +                     .flags = engine,
>>> +                     .rsvd1 = gem_context_create(fd),
>>> +             };
>>> +             set_freq(fd, eb.rsvd1, min, min);
>>> +             gem_execbuf(fd, &eb);
>>> +             gem_context_destroy(fd, eb.rsvd1);
>>> +     }
>>> +     measured = measure_frequency(pmu, SAMPLE_PERIOD);
>>> +     igt_debugfs_dump(fd, "i915_rps_boost_info");
>>> +     igt_info("%s(plug): Measured %.1fMHz, expected %dMhz\n",
>>> +              e->name, measured, min);
>>> +     pmu_assert(measured, min);
>>
>> I would expect queued contexts would have different limits so here you
>> can check that the limit from the currently executing one remains in
>> force? Like it is I am not sure what it's testing?
> 
> The challenge here is to detect that setting a frequency is done on a
> lite-restore, i.e. that the updated context frequency is used for the
> next batch even though the context is currently live.

I don't follow. Here you have ctx0 with freq set to min/min, then you 
queue ctx1..16 all with freq set to min/min, and assert min is measured. 
I don't see lite-restore or still the point of the test. If ctx1..16 
would have freq set to something else you could assert the currently 
running ctx did not have it's limit overwritten.

>>> +     work[0] = __igt_spin_batch_new(fd, ctx, engine, 0);
>>> +
>>> +     /* work is now queued but not executing */
>>> +     freq = (max + min) / 2;
>>> +     set_freq(fd, ctx, freq, freq);
>>> +     get_freq(fd, ctx, &freq, &discard);
>>> +     gem_context_destroy(fd, ctx);
>>> +
>>> +     ctx = gem_context_create(fd);
>>> +     set_freq(fd, ctx, max, max);
>>> +     work[1] = __igt_spin_batch_new(fd, ctx, engine, 0);
>>> +     gem_context_destroy(fd, ctx);
>>> +
>>> +     igt_spin_batch_end(plug);
>>> +     do
>>> +             usleep(10000);
>>> +     while (gem_bo_busy(fd, plug->handle));
>>
>> gem_sync wouldn't cut it? Something internal could fool us here as the
>> queued up batches proceed to start and complete?
> 
> gem_sync -> waitboost. I was writing this to avoid triggering
> waitboosts, and also to cancel out unwanted boosts between loops.

Okay, so helper with a comment would be good.

>>
>>> +     igt_spin_batch_free(fd, plug);
>>> +
>>> +     /* Now work will execute */
>>> +     measured = measure_frequency(pmu, SAMPLE_PERIOD);
>>> +     igt_debugfs_dump(fd, "i915_engine_info");
>>
>> Need this?
> 
> Need it? Have you tried debugging it without it? :)

Is it a debugging leftover is where I was trying to get at. :)

> 
>>> +     igt_debugfs_dump(fd, "i915_rps_boost_info");
>>> +     igt_info("%s(work0): Measured %.1fMHz, expected %dMhz\n",
>>> +              e->name, measured, freq);
>>> +     pmu_assert(measured, freq);
>>> +
>>> +     igt_spin_batch_end(work[0]);
>>> +     do
>>> +             usleep(10000);
>>> +     while (gem_bo_busy(fd, work[0]->handle));
>>> +     igt_spin_batch_free(fd, work[0]);
>>> +
>>> +     measured = measure_frequency(pmu, SAMPLE_PERIOD);
>>> +     igt_debugfs_dump(fd, "i915_engine_info");
>>> +     igt_debugfs_dump(fd, "i915_rps_boost_info");
>>> +     igt_info("%s(work1): Measured %.1fMHz, expected %dMhz\n",
>>> +              e->name, measured, max);
>>> +     pmu_assert(measured, max);
>>> +
>>> +     igt_spin_batch_free(fd, work[1]);
>>> +     close(pmu);
>>> +     gem_quiescent_gpu(fd);
>>> +}
>>> +
>>> +static void set_sysfs_freq(uint32_t min, uint32_t max)
>>> +{
>>> +     igt_sysfs_printf(sysfs, "gt_min_freq_mhz", "%u", min);
>>> +     igt_sysfs_printf(sysfs, "gt_max_freq_mhz", "%u", max);
>>> +}
>>> +
>>> +static void get_sysfs_freq(uint32_t *min, uint32_t *max)
>>> +{
>>> +     igt_sysfs_scanf(sysfs, "gt_min_freq_mhz", "%u", min);
>>> +     igt_sysfs_scanf(sysfs, "gt_max_freq_mhz", "%u", max);
>>> +}
>>> +
>>> +static void sysfs_clamp(int fd, const struct intel_execution_engine *e)
>>> +{
>>> +#define N_STEPS 10
>>> +     const unsigned int engine = e->exec_id | e->flags;
>>> +     uint32_t ctx = gem_context_create(fd);
>>> +     uint32_t sys_min, sys_max;
>>> +     uint32_t min, max;
>>> +     double measured;
>>> +     igt_spin_t *spin;
>>> +     int pmu;
>>> +
>>> +     get_sysfs_freq(&sys_min, &sys_max);
>>> +     igt_info("System min freq: %dMHz; max freq: %dMHz\n", sys_min, sys_max);
>>> +
>>> +     get_freq(fd, ctx, &min, &max);
>>> +     igt_info("Context min freq: %dMHz; max freq: %dMHz\n", min, max);
>>
>> Assert it matches sysfs values?
> 
> It doesn't. The context itself starts at hw limits, as the sysfs limits
> may change over time; and the whole shebang of limits is applied at the
> end to decide who wins.

But at this point how can a pristine context have different limits and 
system global?

>>> +                     int ifrac = inner > N_STEPS ? 2*N_STEPS - inner : inner;
>>> +                     uint32_t ifreq = min + (max - min) * ifrac / N_STEPS;
>>> +
>>> +                     set_freq(fd, ctx, ifreq, ifreq);
>>> +
>>> +                     gem_quiescent_gpu(fd);
>>> +                     spin = __igt_spin_batch_new(fd, ctx, engine, 0);
>>> +                     usleep(10000);
>>> +
>>> +                     set_sysfs_freq(ofreq, ofreq);
>>> +                     get_sysfs_freq(&cur, &discard);
>>
>> Read-back ctx freq and check is correct?
> 
> Define correct. We spent many test cycles to verify these are accurate
> at the points we can decide they are. Duplicating that fuzzy logic here,
> I did not want to do as it is not the test we are focusing on.
> 
> It just happens that those basic API tests are after the behaviour
> tests. (Boring tests came after the tests to check the code was
> behaving.)

I was assuming that setting sysfs limits to something other than what 
the context has set, the reading back the context limits, would read 
back the updated values if they are more restrictive.

Maybe it is not the case?

So ctx freq cannot be set outside the _current_ global limits, but 
global limits can be changed afterwards which query on ctx will not 
reflect? Probably so, because otherwise you would have to walk all 
contexts on sysfs writes.

>   
>>> +
>>> +                     measured = measure_frequency(pmu, SAMPLE_PERIOD);
>>> +                     igt_debugfs_dump(fd, "i915_rps_boost_info");
>>> +
>>> +                     set_sysfs_freq(sys_min, sys_max);
>>
>> Just so we do not exit with random min/max set?
> 
> Yes, as much as possible try to ensure we only assert and end the
> subtest on known state.

There is an exit handler already but it doesn't harm I guess.

>>
>>> +             range[0] = min + (rand() % (max - min) / 2);
>>> +             range[1] = max - (rand() % (max - min) / 2);
>>> +
>>> +             set_freq(fd, ctx, range[0], range[1]);
>>> +             get_freq(fd, ctx, &range[0], &range[1]);
>>> +
>>> +             spin = __igt_spin_batch_new(fd, ctx, engine, 0);
>>> +
>>> +             usleep(10000);
>>> +             measured = measure_frequency(pmu, SAMPLE_PERIOD);
>>> +             igt_spin_batch_free(fd, spin);
>>> +
>>> +             igt_assert(measured >= range[0] - PMU_TOLERANCE &&
>>> +                        measured <= range[1] + PMU_TOLERANCE);
>>> +     }
>>> +
>>> +     gem_context_destroy(fd, ctx);
>>> +     close(pmu);
>>> +}
>>> +
>>> +static void sandwich(int fd, int timeout)
>>> +{
>>> +     unsigned int engine;
>>> +
>>> +     for_each_physical_engine(fd, engine) {
>>> +             igt_fork(child, 1)
>>> +                     sandwich_engine(fd, engine, timeout);
>>> +     }
>>
>> Hmm.. so engines run in parallel - but how come all can assert they got
>> exactly the limits they wanted? Surely someone must get a different
>> range depending on what's running on another engine?
> 
> That's the setup with the overlapping frequency ranges, it ensures that
> we can satisfy all the requests simultaneously. If the ranges are
> disjoint, then at least one engine's request cannot be satisfied.

Okay, but if one engine wants 200-800, and another 600-1000, how can 
both assert that what they measure is in the range of what they asked 
for? What will actually be the resulting range here? 200-1000 or 
600-800? Hm I guess if 600-800 the assert works.

>>> +
>>> +     for (unsigned int n = 0; n < nengine; n++)
>>> +             ctx[n] = gem_context_create(fd);
>>> +
>>> +     do {
>>> +             igt_spin_t *spin;
>>> +             struct {
>>> +                     uint32_t engine;
>>> +                     uint32_t min;
>>> +                     uint32_t max;
>>> +             } req;
>>> +
>>> +             while (read(link, &req, sizeof(req)) > 0) {
>>> +                     if ((req.engine | req.min | req.max) == 0)
>>> +                             goto out;
>>> +
>>> +                     igt_assert(req.engine < nengine);
>>> +                     set_freq(fd, ctx[req.engine], req.min, req.max);
>>> +             }
>>> +
>>> +             /* Create a 20% load using busy spinners */
>>> +             spin = __igt_spin_batch_new(fd, ctx[0], engines[0], 0);
>>> +             for (unsigned int n = 1; n < nengine; n++) {
>>> +                     struct drm_i915_gem_exec_object2 obj = {
>>> +                             .handle = spin->handle,
>>> +                     };
>>> +                     struct drm_i915_gem_execbuffer2 eb = {
>>> +                             .buffer_count = 1,
>>> +                             .buffers_ptr = to_user_pointer(&obj),
>>> +                             .flags = engines[n],
>>> +                             .rsvd1 = ctx[n],
>>> +                     };
>>> +                     gem_execbuf(fd, &eb);
>>> +             }
>>> +             usleep(100);
>>> +             igt_spin_batch_end(spin);
>>> +
>>> +             do
>>> +                     usleep(10);
>>> +             while (gem_bo_busy(fd, spin->handle));
>>
>> This loop should be moved to a helper since it is repeated a few times.
>> (If it is required in this form.)
>>
>>> +             igt_spin_batch_free(fd, spin);
>>> +             usleep(400);
>>
>> 100us/400us is I think too short if you want to get close to target
>> ratio. Will see later whether you actualy depend on it.
> 
> The target ratio here is misleading, busyness is summed over engines, so
> a 5x pulses have a weird effect. I've no idea what the load should look
> like, the challenge is just to ensure that we do not have a 100% saturated
> load and so allow the GPU to downclock. To allow downclocking we have
> to be less than 65% busy (worst case). Other than by watching and
> confirming we up/downclock i.e. the loading was good enough, I don't
> think the pulses are the best method.

Why it is important to allow downclocking?

> 
>>> +     } while (1);
>>> +
>>> +out:
>>> +     for (unsigned int n = 0; n < nengine; n++)
>>> +             gem_context_destroy(fd, ctx[n]);
>>> +}
>>> +
>>> +static void smoketest(int fd, int timeout)
>>> +{
>>> +     unsigned int engines[16];
>>> +     unsigned int nengine;
>>> +     unsigned int engine;
>>> +     uint32_t min[16], max[16];
>>> +     int pmu, link[2];
>>> +
>>> +     get_freq(fd, 0, &min[0], &max[0]);
>>> +
>>> +     nengine = 0;
>>> +     for_each_physical_engine(fd, engine) {
>>> +             if (nengine == ARRAY_SIZE(engines) - 1)
>>> +                     break;
>>> +
>>> +             min[nengine] = min[0];
>>> +             max[nengine] = max[0];
>>> +             engines[nengine] = engine;
>>> +             nengine++;
>>> +     }
>>> +     igt_require(nengine);
>>> +
>>> +     igt_assert(pipe(link) == 0);
>>> +     igt_fork(child, 1)
>>> +             pwm(fd, engines, nengine, link[0]);
>>> +     close(link[0]);
>>> +
>>> +     pmu = perf_i915_open(I915_PMU_REQUESTED_FREQUENCY);
>>> +     igt_require(pmu >= 0);
>>> +
>>> +     igt_until_timeout(timeout) {
>>> +             struct {
>>> +                     uint32_t engine;
>>> +                     uint32_t min;
>>> +                     uint32_t max;
>>> +             } req;
>>> +             double measured;
>>> +             uint32_t ctx;
>>> +
>>> +             req.engine = rand() % nengine;
>>> +
>>> +             ctx = gem_context_create(fd);
>>> +             get_freq(fd, ctx, &req.min, &req.max);
>>> +             req.min = rand() % (req.max - req.min) + req.min;
>>> +             req.max = rand() % (req.max - req.min) + req.min;
>>> +             set_freq(fd, ctx, req.min, req.max);
>>> +             get_freq(fd, ctx, &req.min, &req.max);
>>> +
>>> +             igt_debug("Replacing (%d, %d) on engine %x with (%d, %d)\n",
>>> +                       min[req.engine], max[req.engine], req.engine,
>>> +                       req.min, req.max);
>>> +             igt_assert(write(link[1], &req, sizeof(req)) == sizeof(req));
>>> +             gem_context_destroy(fd, ctx);
>>> +
>>> +             min[req.engine] = req.min;
>>> +             max[req.engine] = req.max;
>>> +
>>> +             for (unsigned int n = 0; n < nengine; n++) {
>>> +                     igt_debug("[%d]: [%d, %d]\n", n, min[n], max[n]);
>>> +                     if (min[n] < req.min)
>>> +                             req.min = min[n];
>>> +                     if (max[n] > req.max)
>>> +                             req.max = max[n];
>>> +             }
>>> +             igt_assert(req.max >= req.min);
>>> +
>>> +             usleep(50000);
>>
>> Have bi-directional comms so you know when child has processed the
>> updated request?
> 
> Not just updated request, but the kernel rps worker to do something.
> A bit hazy.

Ties with the previous question and needs some big fat comments at least.

And what I forgot to say in the first pass - some of these subtests need 
comments to explain the general idea of what they are about to do.

Regards,

Tvrtko