[Intel-gfx] [PATCH v4 02/38] drm/i915: Explicit power enable during deferred context initialisation

Tue Jan 12 08:11:00 PST 2016

On Tue, Jan 12, 2016 at 05:59:46PM +0200, Imre Deak wrote:
> On ti, 2016-01-12 at 16:35 +0100, Daniel Vetter wrote:
> > On Tue, Jan 12, 2016 at 02:21:51PM +0000, John Harrison wrote:
> > > On 12/01/2016 14:04, Daniel Vetter wrote:
> > > > On Tue, Jan 12, 2016 at 11:50:34AM +0000, John Harrison wrote:
> > > > > On 12/01/2016 11:28, Chris Wilson wrote:
> > > > > > On Tue, Jan 12, 2016 at 11:11:20AM +0000, John Harrison
> > > > > > wrote:
> > > > > > > On 12/01/2016 00:20, Chris Wilson wrote:
> > > > > > > > On Mon, Jan 11, 2016 at 06:42:31PM +0000, John.C.Harrison
> > > > > > > > @Intel.com wrote:
> > > > > > > > > From: John Harrison <John.C.Harrison at Intel.com>
> > > > > > > > > 
> > > > > > > > > A later patch in this series re-organises the batch
> > > > > > > > > buffer submission
> > > > > > > > > code. Part of that is to reduce the scope of a
> > > > > > > > > pm_get/put pair.
> > > > > > > > > Specifically, they previously wrapped the entire
> > > > > > > > > submission path from
> > > > > > > > > the very start to the very end, now they only wrap the
> > > > > > > > > actual hardware
> > > > > > > > > submission part in the back half.
> > > > > > > > However, as you haven't fixed the ordering issue that
> > > > > > > > requires rpm_get
> > > > > > > > before struct_mutex, this is broken.
> > > > > > > Why does 'intel_runtime_pm_get' require the struct mutex to
> > > > > > > be held?
> > > > > > > It has certainly not complained at me about trying to do
> > > > > > > stuff
> > > > > > > without it.
> > > > > > Because it depends upon the struct_mutex and rpm doesn't have
> > > > > > sufficient
> > > > > > lockdep integration to be able to warn about using rpm from
> > > > > > the
> > > > > > incorrect contexts.
> > > > > Where? What does the 'pm_runtime_get_sync' call turn into?
> > > > > There are already
> > > > > other places in the driver which call intel_runtime_pm_get()
> > > > > immediately
> > > > > after grabbing the mutex lock. Also, the description comment
> > > > > for _pm_get()
> > > > > does not mention anything about mutexes at all.
> > > > If you nest rpm_get within dev->struct_mutex that's a bug and
> > > > could
> > > > deadlock. Where does this happen? And for any such place we need
> > > > a new
> > > > subtest in pm_rpm.
> > > 
> > > The first two hits when grepping the driver are in
> > > 'i915_gem_seqno_info()'
> > > and 'i915_interrupt_info()' in i915_debugfs.c. Both say:
> > >     ret = mutex_lock_interruptible(&dev->struct_mutex);
> > >     if (ret)
> > >         return ret;
> > >     intel_runtime_pm_get(dev_priv);
> > 
> > Yeah that's totally bonkers and will deadlock if the device is
> > actually
> > suspend.
> 
> It won't deadlock, runtime resume doesn't need struct mutex. Runtime
> suspend needs it, but we return -EAGAIN from there if it's already
> held. That in turn will delay the suspend.

You're right, why do I always mix this up. But this just got me thinking
about possible lockdep annotations. We could place a
might_lock(rpm_lockdep_key) within rpm_get. And fake-acquire that around
the actual ->resume callback. We could even do this in the rpm core ...

That would put all my worries on this topic at easy I think, and I could
finally stop confusing everyone ;-)

Or did I again miss the obvious?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch