[Intel-gfx] [igt-dev] [PATCH i-g-t] i915/gem_exec_balancer: Check for scheduling bonded-pairs on the same engine
Chris Wilson
chris at chris-wilson.co.uk
Mon Sep 23 15:43:43 UTC 2019
Quoting Tvrtko Ursulin (2019-09-23 15:29:11)
>
> On 20/09/2019 23:26, Chris Wilson wrote:
> > The expectation for bonded submission is that they are run concurrently,
> > in parallel on multiple engines. However, given a lack of constraints in
> > the scheduler's selection combined with timeslicing could mean that the
> > bonded requests could be run in opposite order on the same engine. With
> > just the right pair of requests, this can cause a GPU hang (or at least
> > trigger hangchecker), best (worst) case would be execution running
> > several times slower than ideal.
>
> I don't see any bonding being setup?
>
> [comes back later]
>
> Oh you used only the submit fence and not actually bonds. But you also
> don't use the virtual engine at all?
A is using either of the 2 real engines, B is using the virtual engine
to select the other available engine. Bonding in this case is just that
the requests are bonded together to run in parallel.
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > tests/i915/gem_exec_balancer.c | 151 +++++++++++++++++++++++++++++++++
> > 1 file changed, 151 insertions(+)
> >
> > diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
> > index 407dc0eca..e4fe75747 100644
> > --- a/tests/i915/gem_exec_balancer.c
> > +++ b/tests/i915/gem_exec_balancer.c
> > @@ -30,6 +30,15 @@
> >
> > IGT_TEST_DESCRIPTION("Exercise in-kernel load-balancing");
> >
> > +#define MI_SEMAPHORE_WAIT (0x1c << 23)
> > +#define MI_SEMAPHORE_POLL (1 << 15)
> > +#define MI_SEMAPHORE_SAD_GT_SDD (0 << 12)
> > +#define MI_SEMAPHORE_SAD_GTE_SDD (1 << 12)
> > +#define MI_SEMAPHORE_SAD_LT_SDD (2 << 12)
> > +#define MI_SEMAPHORE_SAD_LTE_SDD (3 << 12)
> > +#define MI_SEMAPHORE_SAD_EQ_SDD (4 << 12)
> > +#define MI_SEMAPHORE_SAD_NEQ_SDD (5 << 12)
> > +
> > #define INSTANCE_COUNT (1 << I915_PMU_SAMPLE_INSTANCE_BITS)
> >
> > static size_t sizeof_load_balance(int count)
> > @@ -694,6 +703,145 @@ static void bonded(int i915, unsigned int flags)
> > gem_context_destroy(i915, master);
> > }
> >
> > +static unsigned int offset_in_page(void *addr)
> > +{
> > + return (uintptr_t)addr & 4095;
> > +}
> > +
> > +static uint32_t create_semaphore_to_spinner(int i915, igt_spin_t *spin)
> > +{
> > + uint32_t *cs, *map;
> > + uint32_t handle;
> > +
> > + handle = gem_create(i915, 4096);
> > + cs = map = gem_mmap__cpu(i915, handle, 0, 4096, PROT_WRITE);
> > +
> > + /* Wait until the spinner is running */
> > + *cs++ = MI_SEMAPHORE_WAIT |
> > + MI_SEMAPHORE_POLL |
> > + MI_SEMAPHORE_SAD_NEQ_SDD |
> > + (4 - 2);
> > + *cs++ = 0;
> > + *cs++ = spin->obj[0].offset + 4 * SPIN_POLL_START_IDX;
> > + *cs++ = 0;
> > +
> > + /* Then cancel the spinner */
> > + *cs++ = MI_STORE_DWORD_IMM;
> > + *cs++ = spin->obj[IGT_SPIN_BATCH].offset +
> > + offset_in_page(spin->condition);
> > + *cs++ = 0;
> > + *cs++ = MI_BATCH_BUFFER_END;
> > +
> > + *cs++ = MI_BATCH_BUFFER_END;
> > + munmap(map, 4096);
> > +
> > + return handle;
> > +}
> > +
> > +static void bonded_slice(int i915)
> > +{
> > + uint32_t ctx;
> > + int *stop;
> > +
> > + igt_require(gem_scheduler_has_semaphores(i915));
> > +
> > + stop = mmap(0, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> > + igt_assert(stop != MAP_FAILED);
> > +
> > + ctx = gem_context_create(i915);
> > +
> > + for (int class = 0; class < 32; class++) {
> > + struct i915_engine_class_instance *siblings;
> > + struct drm_i915_gem_exec_object2 obj[3] = {};
> > + struct drm_i915_gem_execbuffer2 eb = {};
> > + unsigned int count;
> > + igt_spin_t *spin;
> > +
> > + siblings = list_engines(i915, 1u << class, &count);
> > + if (!siblings)
> > + continue;
> > +
> > + if (count < 2) {
> > + free(siblings);
> > + continue;
> > + }
> > +
> > + /*
> > + * A: semaphore wait on spinner; cancel spinner
> > + * B: unpreemptable spinner
> > + *
> > + * A waits for running ack from B, if scheduled on the same
> > + * engine -> hang.
> > + *
> > + * C+: background load across engines
> > + */
> > +
> > + set_load_balancer(i915, ctx, siblings, count, NULL);
> > +
> > + spin = __igt_spin_new(i915,
> > + .ctx = ctx,
> > + .flags = (IGT_SPIN_NO_PREEMPTION |
> > + IGT_SPIN_POLL_RUN));
> > + igt_spin_end(spin); /* we just want its address for later */
> > + gem_sync(i915, spin->handle);
> > + igt_spin_reset(spin);
> > +
> > + obj[0] = spin->obj[0];
> > + obj[1] = spin->obj[1];
> > + obj[2].handle = create_semaphore_to_spinner(i915, spin);
> > +
> > + eb.buffers_ptr = to_user_pointer(obj);
> > + eb.rsvd1 = ctx;
> > +
> > + *stop = 0;
> > + igt_fork(child, count + 1) {
> > + igt_list_del(&spin->link);
> > +
> > + ctx = gem_context_clone(i915, ctx,
> > + I915_CONTEXT_CLONE_ENGINES, 0);
> > +
> > + while (!READ_ONCE(*stop)) {
> > + spin = igt_spin_new(i915,
> > + .ctx = ctx,
> > + .engine = (1 + rand() % count),
>
> With "count + 1" children and rand load my end up uneven across engines
> - are you happy with that?
It's using rand, it's going to be uneven. count + 1 isn't significant in
any way. I stopped as soon as I had the test reliably hitting the issue.
> > + .flags = IGT_SPIN_POLL_RUN);
> > + igt_spin_busywait_until_started(spin);
> > + usleep(50000);
>
> 50ms, hm, ideally there should be a pipe signal before parent starts the
> test to know children have started. Otherwise parent can finish before
> they even start, no?
The children are just to provide noise. The requirement is that we have
enough load across the system to cause timeslicing to kick in.
> > + igt_spin_free(i915, spin);
> > + }
> > +
> > + gem_context_destroy(i915, ctx);
> > + }
> > +
> > + igt_until_timeout(5) {
> > + igt_spin_reset(spin);
>
> What is the reset for?
We are reusing the spinner inside the loop.
> > +
> > + /* A: Submit the semaphore wait */
> > + eb.buffer_count = 3;
> > + eb.flags = (1 + rand() % count) | I915_EXEC_FENCE_OUT;
> > + gem_execbuf_wr(i915, &eb);
> > +
> > + /* B: Submit the spinner (in parallel) */
>
> How in parallel when it is the same context so they are implicitly in order?
Different engines with different timelines, using the submit to request
parallel execution.
-Chris
More information about the Intel-gfx
mailing list