[igt-dev] [PATCH i-g-t v4] i915/i915_pm_rps: Check impact of fence ordering on waitboosting

Chris Wilson chris at chris-wilson.co.uk
Thu Jul 28 14:12:02 UTC 2022


Quoting Kamil Konieczny (2022-07-28 06:29:31)
> Hi,
> 
> On 2022-07-27 at 14:27:02 +0200, Karolina Drobnik wrote:
> > Hi,
> > 
> > On 26.07.2022 12:49, Kamil Konieczny wrote:
> > > Hi Karolina,
> > > 
> > > few more nits, see below.
> > > 
> > > On 2022-07-13 at 12:53:42 +0200, Karolina Drobnik wrote:
> > > > From: Chris Wilson <chris at chris-wilson.co.uk>
> > > > 
> > > > Currently the wait boost heuristic is evaluated at the start of each
> > > > fence wait for a series within dma-resv. There is no strict ordering of
> > > > fences defined by dma-resv, and so it turns out that the same operation
> > > > under different circumstances can result in different heuristics being
> > > > applied, and dramatic performance variations in user applications.
> > > > 
> > > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Signed-off-by: Karolina Drobnik <karolina.drobnik at intel.com>
> > > > ---
> > > > v4:
> > > >    - Added test descriptions with igt_describe()
> > > > v3:
> > > >    - Removed unnecessary calls to gem_write
> > > > v2:
> > > >    - Introduced batch_create() with arbitration points to prevent hangs
> > > >    - Added engine-order subtest
> > > >    - Added reporting of GPU frequency
> > > >    - Changed runtimes and switched prints from us to ms
> > > > 
> > > >   tests/i915/i915_pm_rps.c | 261 ++++++++++++++++++++++++++++++++++++++-
> > > >   1 file changed, 255 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/tests/i915/i915_pm_rps.c b/tests/i915/i915_pm_rps.c
> > > > index b37f15c2..6125ea60 100644
> > > > --- a/tests/i915/i915_pm_rps.c
> > > > +++ b/tests/i915/i915_pm_rps.c
> > > > @@ -37,8 +37,10 @@
> > > >   #include <sys/wait.h>
> > > > 
> > > >   #include "i915/gem.h"
> > > > +#include "i915/gem_create.h"
> > > >   #include "igt.h"
> > > >   #include "igt_dummyload.h"
> > > > +#include "igt_perf.h"
> > > >   #include "igt_sysfs.h"
> > > > 
> > > >   IGT_TEST_DESCRIPTION("Render P-States tests - verify GPU frequency changes");
> > > > @@ -614,14 +616,253 @@ static void waitboost(int fd, bool reset)
> > > >           igt_assert_lt(post_freqs[CUR], post_freqs[MAX]);
> > > >   }
> > > > 
> > > > +static uint32_t batch_create(int i915, uint64_t sz)
> > > > +{
> > > > + const uint32_t bbe = MI_BATCH_BUFFER_END;
> > > > + const uint32_t chk = 0x5 << 23;
> > > > + uint32_t handle = gem_create(i915, sz);
> > > > +
> > > > + for (uint64_t pg = 4096; pg + 4096 < sz; pg += 4096)
> > > > +         gem_write(i915, handle, pg, &chk, sizeof(chk));
> > > > +
> > > > + gem_write(i915, handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
> > > > +
> > > > + return handle;
> > > > +}
> > > > +
> > > > +static uint64_t __fence_order(int i915,
> > > > +                       struct drm_i915_gem_exec_object2 *obj,
> > > > +                       struct drm_i915_gem_execbuffer2 *eb,
> > > > +                       uint64_t flags0, uint64_t flags1,
> > > > +                       double *outf)
> > > > +{
> > > > + uint64_t before[2], after[2];
> > > > + struct timespec tv;
> > > > + int fd;
> > > > +
> > > > + gem_quiescent_gpu(i915);
> > > > + fd = perf_i915_open(i915, I915_PMU_ACTUAL_FREQUENCY);
> > > > +
> > > > + igt_gettime(&tv);
> > > > +
> > > > + obj->flags = flags0;
> > > > + gem_execbuf(i915, eb);
> > > > +
> > > > + obj->flags = flags1;
> > > > + gem_execbuf(i915, eb);
> > > > +
> > > > + read(fd, before, sizeof(before));
> > > > + gem_sync(i915, obj->handle);
> > > > + read(fd, after, sizeof(after));
> > > > + close(fd);
> > > > +
> > > > + after[0] -= before[0];
> > > > + after[1] -= before[1];
> > > 
> > > Put igt_assert here (check for zero before divide).
> > 
> > Hmm, isn't that more of a belts and suspenders approach? It's unlikely we'd
> > get zero here. Also, don't we put asserts towards the end of the test
> > whenever possible?
> 
> It is userspace so we can afford some extra checks, so maybe
>         if (after[1] == 0)
>                 return 0;
> > 
> > > > +
> > > > + *outf = 1e9 * after[0] / after[1];
> > > > + return igt_nsec_elapsed(&tv);
> > > > +}
> > > > +
> > > > +static void fence_order(int i915)
> > > > +{
> > > > + const uint64_t sz = 512ull << 20;
> > > > + struct drm_i915_gem_exec_object2 obj[2] = {
> > > > +         { .handle = gem_create(i915, 4096) },
> > > > +         { .handle = batch_create(i915, sz) },
> > > > + };
> > > > + struct drm_i915_gem_execbuffer2 execbuf = {
> > > > +         .buffers_ptr = to_user_pointer(obj),
> > > > +         .buffer_count = ARRAY_SIZE(obj),
> > > > + };
> > > > + uint64_t wr, rw;
> > > > + int min, max;
> > > > + double freq;
> > > > + int sysfs;
> > > > +
> > > > + /*
> > > > +  * Check the order of fences found during GEM_WAIT does not affect
> > > > +  * waitboosting.
> > > > +  *
> > > > +  * Internally, implicit fences are tracked within a dma-resv which
> > > > +  * imposes no order on the individually fences tracked within. Since
> > > > +  * there is no defined order, the sequence of waits (and the associated
> > > > +  * waitboosts) is also undefined, undermining the consistency of the
> > > > +  * waitboost heuristic.
> > > > +  *
> > > > +  * In particular, we can influence the sequence of fence storage
> > > > +  * within dma-resv by mixing read/write semantics for implicit fences.
> > > > +  * We can exploit this property of dma-resv to exercise that no matter
> > > > +  * the stored order, the heuristic is applied consistently for the
> > > > +  * user's GEM_WAIT ioctl.
> > > > +  */
> > > > +
> > > > + sysfs = igt_sysfs_open(i915);
> > > > + min = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
> > > > + max = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
> > > > + igt_require(max > min);
> > > > +
> > > > + /* Only allow ourselves to upclock via waitboosting */
> > > > + igt_sysfs_printf(sysfs, "gt_min_freq_mhz", "%d", min);
> > > > + igt_sysfs_printf(sysfs, "gt_max_freq_mhz", "%d", min);
> > > > + igt_sysfs_printf(sysfs, "gt_boost_freq_mhz", "%d", max);
> > > > +
> > > > + /* Warm up to bind the vma */
> > > > + __fence_order(i915, &obj[0], &execbuf, 0, 0, &freq);
> > > > +
> > > > + wr = __fence_order(i915, &obj[0], &execbuf, EXEC_OBJECT_WRITE, 0, &freq);
> > > > + igt_info("Write-then-read: %.2fms @ %.3fMHz\n", wr * 1e-6, freq);
> > > > +
> > > > + rw = __fence_order(i915, &obj[0], &execbuf, 0, EXEC_OBJECT_WRITE, &freq);
> > > > + igt_info("Read-then-write: %.2fms @ %.3fMHz\n", rw * 1e-6, freq);
> > > > +
> > > > + gem_close(i915, obj[0].handle);
> > > > + gem_close(i915, obj[1].handle);
> > > 
> > > Close sysfs here.
> > 
> > Good catch. I'll close it before asserts (as agreed).
> > 
> 
> My bad, this close should be after resetting max freq, two lines
> below. Btw why not reuse code from other tests, reading old freqs
> at beginnig of test or reusing freqs readed at fixture in
> igt_main ?
> 
> > > > +
> > > > + igt_sysfs_printf(sysfs, "gt_min_freq_mhz", "%d", min);
> > > > + igt_sysfs_printf(sysfs, "gt_max_freq_mhz", "%d", max);
> > > 
> > > Here we should put previous values, not those readed from RPx,
> > > but if they are always the same it is ok.
> > 
> > We restore the values we read from sysfs
> > 
> 
> But we readed RPx values, not gt_min|gt_max freq ones, so here
> you can use origfreqs[MIN] and origfreqs[MAX] or declare in test

Not required. igt owns the device. We completely take over the device,
restoring default properties, all previous state will be lost. Anything
that pretends otherwise is misleading.
-Chris


More information about the igt-dev mailing list