[Intel-gfx] [igt-dev] [PATCH i-g-t] i915/gem_ctx_exec: Exercise execution along context while closing it
Chris Wilson
chris at chris-wilson.co.uk
Fri Dec 4 11:13:06 UTC 2020
Quoting Tvrtko Ursulin (2020-12-04 10:52:23)
>
> On 03/12/2020 09:59, Chris Wilson wrote:
> > Race the execution and interrupt handlers along a context, while
> > closing it at a random time.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > tests/i915/gem_ctx_exec.c | 60 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 60 insertions(+)
> >
> > diff --git a/tests/i915/gem_ctx_exec.c b/tests/i915/gem_ctx_exec.c
> > index 194191def..18d5d1217 100644
> > --- a/tests/i915/gem_ctx_exec.c
> > +++ b/tests/i915/gem_ctx_exec.c
> > @@ -336,6 +336,63 @@ static void nohangcheck_hostile(int i915)
> > close(i915);
> > }
> >
> > +static void close_race(int i915)
> > +{
> > + const int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
> > + uint32_t *contexts;
> > +
> > + contexts = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> > + igt_assert(contexts != MAP_FAILED);
> > +
> > + for (int child = 0; child < ncpus; child++)
> > + contexts[child] = gem_context_clone_with_engines(i915, 0);
> > +
> > + igt_fork(child, ncpus) {
> > + igt_spin_t *spin;
> > +
> > + spin = igt_spin_new(i915, .flags = IGT_SPIN_POLL_RUN);
> > + igt_spin_end(spin);
> > + gem_sync(i915, spin->handle);
> > +
> > + while (!READ_ONCE(contexts[ncpus])) {
> > + int64_t timeout = 1;
> > +
> > + igt_spin_reset(spin);
> > + igt_assert(!igt_spin_has_started(spin));
> > +
> > + spin->execbuf.rsvd1 = READ_ONCE(contexts[child]);
> > + if (__gem_execbuf(i915, &spin->execbuf))
> > + continue;
> > +
> > + igt_assert(gem_bo_busy(i915, spin->handle));
>
> I've seen this line fail in CI results - any idea how that can happen?
Yes. The presumption we have in this test is that if gem_execbuf
succeeds, the request will execute.
However, see
drm/i915/gem: Propagate error from cancelled submit due to context closure
https://patchwork.freedesktop.org/patch/405392/?series=84531&rev=1
we are cancelling the request if we detect the context is closed before
the request is submitted, but still returned success (with an async
error).
I think the test makes a fair assumption, and it's easier than I was
thinking to return the error for the closed context, which makes the
gem_execbuf solid for this race.
> > + gem_wait(i915, spin->handle, &timeout); /* prime irq */
>
> Is this depending on implementation specific behaviour, that we will
> leave the irq on after the waiter had exited?
It's the best I can do. Nothing in the uAPI should ever govern exactly
the HW details.
And since this can be used to trick the machine into locking up under
irq pressure... Maybe not the best.
> > + igt_spin_busywait_until_started(spin);
> > +
> > + igt_spin_end(spin);
> > + gem_sync(i915, spin->handle);
> > + }
> > +
> > + igt_spin_free(i915, spin);
> > + }
> > +
> > + igt_until_timeout(5) {
> > + for (int child = 0; child < ncpus; child++) {
> > + gem_context_destroy(i915, contexts[child]);
> > + contexts[child] =
> > + gem_context_clone_with_engines(i915, 0);
>
> Right so deliberate attempt to occasionally make the child use closed
> context. Presumably, well according to the CI results, it does manage to
> consistently hit it, which surprises me a bit. A comment here would be good.
>
> > + }
> > + usleep(1000);
>
> Maybe add some randomness here? Or even a random busy loop within the
> child loop? I haven't looked at the i915 patch yet to know where the
> race actually is..
The CPU scheduler is great at providing randomness :)
-Chris
More information about the Intel-gfx
mailing list