[Intel-gfx] [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset

Chris Wilson chris at chris-wilson.co.uk
Fri Mar 9 14:22:58 UTC 2018


Quoting Chris Wilson (2018-03-09 14:10:34)
> Quoting Chris Wilson (2018-03-07 13:42:26)
> > tasklet_kill() will spin waiting for the current tasklet to be executed.
> > However, if tasklet_disable() has been called, then the tasklet is never
> > executed but permanently put back onto the runlist until
> > tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> > disable/enable pair. This is the case when we call set-wedge from inside
> > i915_reset(), and another request was submitted to us concurrent to the
> > reset.
> > 
> > Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3b44952e089f..e75af06904b7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> >          * Turning off the execlists->tasklet until the reset is over
> >          * prevents the race.
> >          */
> > -       tasklet_kill(&engine->execlists.tasklet);
> > +       if (!atomic_read(&engine->execlists.tasklet.count))
> > +               tasklet_kill(&engine->execlists.tasklet);
> 
> Note this is racy; two separate atomic operations. The only race we have
> is with set-wedge vs reset, which is a rare and already contentious
> issue. The upside of preventing the lockup is that we don't lose the
> machine.
> 
> +        * Note that this needs to be a single atomic operation on the
> +        * tasklet (flush existing tasks, prevent new tasks) to prevent
> +        * a race between reset and set-wedged. It is not, so we do the best
> +        * we can atm and make sure we don't lock the machine up in the more
> +        * common case of recursing calling set-wedged from inside i915_reset.

The alternative is to not try and do tasklet_kill() here. But that has
the side-effect of leaving it spinning if it was running at the same
time as the suspend. Hmm, for the moment I'll go with the comment and
see how inspired we are once gem_eio stops dieing.
-Chris


More information about the Intel-gfx mailing list