[Intel-gfx] [PATCH] drm/i915: Restart all engines atomically
Chris Wilson
chris at chris-wilson.co.uk
Tue Feb 7 17:14:38 UTC 2017
On Tue, Feb 07, 2017 at 05:58:58PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > When we restart the engines, and we have active requests, a request on
> > the first engine may complete and queue a request to the second engine
> > before we try to restart the second engine. That queueing of the
> > request may itself cause the engine to restart, and so the subsequent
> > handling by engine->init_hw() will corrupt the current state.
> >
> > If we restart all the engines under stop_machine(), we will not process
> > any interrupts as we restart the engines, and all the engines will
> > appear to start simultaneously from the snapshot of state.
> >
> > Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
> > Testcase: igt/gem_exec_fence/await-hang
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 34 +++++++++++++++++++++++++++-------
> > 1 file changed, 27 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 8a510c7f6828..65651a889813 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4369,11 +4369,30 @@ static void init_unused_rings(struct drm_i915_private *dev_priv)
> > }
> > }
> >
> > -int
> > -i915_gem_init_hw(struct drm_i915_private *dev_priv)
> > +static int __i915_gem_restart_engines(void *data)
> > {
> > + struct drm_i915_private *i915 = data;
> > struct intel_engine_cs *engine;
> > enum intel_engine_id id;
> > + int err;
> > +
> > + /* We want all the engines to restart from the same snapshot, the
> > + * restart has to be appear simultaneous (i.e. atomic). If one
> > + * request on the first engine completes and queues a request for
> > + * a secodn engine, *before* we call init_hw on that second engine,
> s/secodn/second.
>
> > + * we may corrupt state.
> > + */
> > + for_each_engine(engine, i915, id) {
> > + err = engine->init_hw(engine);
> > + if (err)
> > + return err;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int i915_gem_init_hw(struct drm_i915_private *dev_priv)
> > +{
> > int ret;
> >
> > dev_priv->gt.last_init_time = ktime_get();
> > @@ -4419,11 +4438,12 @@ i915_gem_init_hw(struct drm_i915_private *dev_priv)
> > }
> >
> > /* Need to do basic initialisation of all rings first: */
> > - for_each_engine(engine, dev_priv, id) {
> > - ret = engine->init_hw(engine);
> > - if (ret)
> > - goto out;
> > - }
> > + if (dev_priv->gt.active_requests)
>
> This should only increment while mutex is held, so
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>
>
> > + ret = stop_machine(__i915_gem_restart_engines, dev_priv, NULL);
The important thing here is preventing concurrent running of
engine->init_hw() with the tasklet - as I hope nothing will happen if we
write to ELSP whilst the engine is not in execlists mode. For that
stop_machine() may be overkill, and I wonder if just playing around with
tasklet will be enough. Let me try a second patch...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list