[igt-dev] [PATCH i-g-t 2/6] igt/gem_sync: Alternate stress for nop+sync

Tue Jun 19 13:39:51 UTC 2018

Quoting Joonas Lahtinen (2018-06-19 14:36:42)
> Quoting Chris Wilson (2018-06-19 13:49:16)
> > Apply a different sort of stress by timing how long it takes to sync a
> > second nop batch in the pipeline. We first start a spinner on the
> > engine, then when we know the GPU is active, we submit the second nop;
> > start timing as we then release the spinner and wait for the nop to
> > complete.
> > 
> > As with every other gem_sync test, it serves two roles. The first is
> > that it checks that we do not miss a wakeup under common stressful
> > conditions (the more conditions we check, the happier we will be that
> > they do not occur in practice). And the second role it fulfils, is that
> > it provides a very crude estimate for how long it takes for a nop to
> > execute from a running start (we already have a complimentary estimate
> > for an idle start).
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> 
> <SNIP>
> 
> > +static void
> > +wakeup_ring(int fd, unsigned ring, int timeout)
> > +{
> 
> <SNIP>
> 
> > +       intel_detect_and_clear_missed_interrupts(fd);
> > +       igt_fork(child, num_engines) {
> > +               const uint32_t bbe = MI_BATCH_BUFFER_END;
> > +               struct drm_i915_gem_exec_object2 object;
> > +               struct drm_i915_gem_execbuffer2 execbuf;
> > +               double end, this, elapsed, now;
> > +               unsigned long cycles;
> > +               uint32_t cmd;
> > +               igt_spin_t *spin;
> > +
> > +               memset(&object, 0, sizeof(object));
> > +               object.handle = gem_create(fd, 4096);
> > +               gem_write(fd, object.handle, 0, &bbe, sizeof(bbe));
> > +
> > +               memset(&execbuf, 0, sizeof(execbuf));
> > +               execbuf.buffers_ptr = to_user_pointer(&object);
> > +               execbuf.buffer_count = 1;
> > +               execbuf.flags = engines[child % num_engines];
> > +
> > +               spin = __igt_spin_batch_new_poll(fd, 0, execbuf.flags);
> > +               igt_assert(spin->running);
> > +               cmd = *spin->batch;
> > +
> > +               gem_execbuf(fd, &execbuf);
> > +
> > +               igt_spin_batch_end(spin);
> > +               gem_sync(fd, object.handle);
> > +
> > +               end = gettime() + timeout;
> > +               elapsed = 0;
> > +               cycles = 0;
> > +               do {
> > +                       *spin->batch = cmd;
> > +                       *spin->running = 0;
> 
> igt_spin_batch_reset/resume/whatever...

And here you see why Tvrtko and myself never formalised that part of the
API.

Anyway, I just found why this was underperforming. Those 250 MI_NOOPs we
insert to stop the GPU eating itself cost around 5us, which considering
the target here is say 1us, is quite huge.
-Chris