[Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
Chris Wilson
chris at chris-wilson.co.uk
Wed Feb 5 18:39:55 UTC 2020
Quoting Tvrtko Ursulin (2020-02-05 18:33:57)
>
> On 05/02/2020 16:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
> >> On 05/02/2020 12:13, Chris Wilson wrote:
> >>> If we have a set of active engines marked as being non-persistent, we
> >>> lose track of those if the user replaces those engines with
> >>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> >>> non-persistent requests are terminated if they are no longer being
> >>> tracked by the user's context (in order to prevent a lost request
> >>> causing an untracked and so unstoppable GPU hang), we need to apply the
> >>> same context cancellation upon changing engines.
> >>>
> >>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> >>> Testcase: XXX
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> ---
> >>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> >>> 1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> index 52a749691a8d..20f1d3e0221f 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
> >>>
> >>> replace:
> >>> mutex_lock(&ctx->engines_mutex);
> >>> +
> >>> + /* Flush stale requests off the old engines if required */
> >>> + if (!i915_gem_context_is_persistent(ctx) ||
> >>> + !i915_modparams.enable_hangcheck)
> >>> + kill_context(ctx);
> >>
> >> Is the negative effect of this is legit contexts can't keep submitting
> >> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
> >> still. Might break legitimate userspace. Not that I offer solutions.. :(
> >> Banning changing engines once context went non-persistent? That too can
> >> break someone.
> >
> > It closes the hole we have. To do otherwise, we need to keep track of
> > the old engines. Not an impossible task, certainly inconvenient.
> >
> > struct old_engines {
> > struct i915_active active;
> > struct list_head link;
> > struct i915_gem_context *ctx;
> > void *engines;
> > int num_engines;
> > };
> >
> > With a list+spinlock in the ctx that we can work in kill_context.
> >
> > The biggest catch there is actually worrying about attaching the active
> > to already executing request, and making sure the coupling doesn't bug
> > on a concurrent completion. Hmm, it's just a completion callback, but
> > more convenient to use a ready made one.
>
> What would you do with old engines? We don't have a mechanism to mark
> intel_context closed. Hm, right, it would get unreachable by definition.
> But how to terminate it if it doesn't play nicely?
Wait 30 minutes (if it passes the tests) and you'll find out :)
-Chris
More information about the Intel-gfx
mailing list