[Intel-gfx] [PATCH] drm/i915: Prevent lock-cycles between GPU waits and GPU resets

Chris Wilson chris at chris-wilson.co.uk
Wed Jun 12 11:04:46 UTC 2019


Quoting Mika Kuoppala (2019-06-12 12:00:07)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > We cannot allow ourselves to wait on the GPU while holding any lock we
> 
> s/we/as we?
> 
> My english parser is not strong.
> 
> > may need to reset the GPU. While there is not an explicit lock between
> > the two operations, lockdep cannot detect the dependency. So let's tell
> > lockdep about the wait/reset dependency with an explicit lockmap.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > This is *annoyingly* good at detecting lock cycles in GPU reset.
> > -Chris
> > ---
> >  drivers/gpu/drm/i915/gt/intel_reset.c            | 5 ++++-
> >  drivers/gpu/drm/i915/i915_drv.h                  | 8 ++++++++
> >  drivers/gpu/drm/i915/i915_gem.c                  | 3 +++
> >  drivers/gpu/drm/i915/i915_request.c              | 2 ++
> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 ++
> >  5 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 60d24110af80..6368b37f26d1 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -978,10 +978,11 @@ void i915_reset(struct drm_i915_private *i915,
> >  
> >       might_sleep();
> >       GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
> > +     lock_map_acquire(&i915->gt.reset_lockmap);
> >  
> >       /* Clear any previous failed attempts at recovery. Time to try again. */
> >       if (!__i915_gem_unset_wedged(i915))
> > -             return;
> > +             goto unlock;
> >  
> >       if (reason)
> >               dev_notice(i915->drm.dev, "Resetting chip for %s\n", reason);
> > @@ -1029,6 +1030,8 @@ void i915_reset(struct drm_i915_private *i915,
> >  
> >  finish:
> >       reset_finish(i915);
> > +unlock:
> > +     lock_map_release(&i915->gt.reset_lockmap);
> >       return;
> 
> The return patter in this function is rather unorthodox. Might be
> even that I reviewed it. Very close that I fell into trap of thinking
> that you return with lock held.

Sssh. It's a one-off unorthodoxy. Exception to the rule type of thing.

> >  taint:
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 0ea7f78ae227..9cfa9500fcc4 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1919,6 +1919,14 @@ struct drm_i915_private {
> >               ktime_t last_init_time;
> >  
> >               struct i915_vma *scratch;
> > +
> > +             /*
> > +              * We must never wait on the GPU while holding a lock we may
> 
> My english parser still expected 'as' somewhere in there.

Both fixes required, thanks.
-Chris


More information about the Intel-gfx mailing list