[Intel-gfx] [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close
Chris Wilson
chris at chris-wilson.co.uk
Wed Oct 2 14:23:46 UTC 2019
Quoting Bloomfield, Jon (2019-10-02 14:52:32)
>
>
> > -----Original Message-----
> > From: Chris Wilson <chris at chris-wilson.co.uk>
> > Sent: Wednesday, October 2, 2019 4:20 AM
> > To: intel-gfx at lists.freedesktop.org
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>; Joonas Lahtinen
> > <joonas.lahtinen at linux.intel.com>; Winiarski, Michal
> > <michal.winiarski at intel.com>; Bloomfield, Jon <jon.bloomfield at intel.com>
> > Subject: [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close
> >
> > Normally, we rely on our hangcheck to prevent persistent batches from
> > hogging the GPU. However, if the user disables hangcheck, this mechanism
> > breaks down. Despite our insistence that this is unsafe, the users are
> > equally insistent that they want to use endless batches and will disable
> > the hangcheck mechanism. We are looking at perhaps replacing hangcheck
> > with a softer mechanism, that sends a pulse down the engine to check if
> > it is well. We can use the same preemptive pulse to flush an active
> > persistent context off the GPU upon context close, preventing resources
> > being lost and unkillable requests remaining on the GPU after process
> > termination. To avoid changing the ABI and accidentally breaking
> > existing userspace, we make the persistence of a context explicit and
> > enable it by default (matching current ABI). Userspace can opt out of
> > persistent mode (forcing requests to be cancelled when the context is
> > closed by process termination or explicitly) by a context parameter. To
> > facilitate existing use-cases of disabling hangcheck, if the modparam is
> > disabled (i915.enable_hangcheck=0), we disable persistence mode by
> > default. (Note, one of the outcomes for supporting endless mode will be
> > the removal of hangchecking, at which point opting into persistent mode
> > will be mandatory, or maybe the default perhaps controlled by cgroups.)
> >
> > Testcase: igt/gem_ctx_persistence
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: MichaĆ Winiarski <michal.winiarski at intel.com>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Reviewed-by: Jon Bloomfield <jon.bloomfield at intel.com>
> > ---
> > drivers/gpu/drm/i915/Makefile | 1 +
> > drivers/gpu/drm/i915/gem/i915_gem_context.c | 122 ++++++++++++++++++
> > drivers/gpu/drm/i915/gem/i915_gem_context.h | 15 +++
> > .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 +
> > .../gpu/drm/i915/gem/selftests/mock_context.c | 2 +
> > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 56 ++++++++
> > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 14 ++
> > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +-
> > drivers/gpu/drm/i915/i915_priolist_types.h | 1 +
> > include/uapi/drm/i915_drm.h | 15 +++
> > 10 files changed, 228 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 1f87c70a2842..561fa2bb3006 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -76,6 +76,7 @@ gt-y += \
> > gt/intel_breadcrumbs.o \
> > gt/intel_context.o \
> > gt/intel_engine_cs.o \
> > + gt/intel_engine_heartbeat.o \
> > gt/intel_engine_pm.o \
> > gt/intel_engine_pool.o \
> > gt/intel_engine_sysfs.o \
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 0ab416887fc2..e832238a72e5 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>
> <SNIP>
>
> > +static int
> > +set_persistence(struct i915_gem_context *ctx,
> > + const struct drm_i915_gem_context_param *args)
> > +{
> > + if (args->size)
> > + return -EINVAL;
> > +
> > + if (args->value) {
> > + i915_gem_context_set_persistence(ctx);
> > + return 0;
> > + }
> > +
> > + /* To cancel a context we use "preempt-to-idle" */
> > + if (!(ctx->i915->caps.scheduler &
> > I915_SCHEDULER_CAP_PREEMPTION))
> > + return -ENODEV;
> > +
> > + i915_gem_context_clear_persistence(ctx);
> > + return 0;
> > +}
>
> Hmmn. Given that disabling hangcheck is an explicit operation, and we already change the default setting, can't we make it a hard requirement that persistence requires hangcheck? You should not really be able to opt back in to persistence if hangcheck is disabled. In fact you could just test for hangcheck when deciding whether to kill the context, and force-kill if it is off - that way if hangcheck is disabled after a context starts it will get cleaned up.
Just great, now I got to update the igt to treat i915.enable_hangcheck
as API!
-Chris
More information about the Intel-gfx
mailing list