[Intel-gfx] [PATCH v2 3/4] drm/i915: make assert_device_not_suspended more precise

Wed Nov 18 07:47:10 PST 2015

On Wed, Nov 18, 2015 at 05:11:03PM +0200, Imre Deak wrote:
> On ke, 2015-11-18 at 16:01 +0100, Daniel Vetter wrote:
> > On Wed, Nov 18, 2015 at 04:58:46PM +0200, Imre Deak wrote:
> > > On ke, 2015-11-18 at 16:44 +0200, Imre Deak wrote:
> > > > On ke, 2015-11-18 at 15:37 +0100, Daniel Vetter wrote:
> > > > > On Mon, Nov 09, 2015 at 09:13:45PM +0200, Imre Deak wrote:
> > > > > > Atm, we assert that the device is not suspended after the point
> > > > > > when the
> > > > > > HW is truly put to a suspended state. This is fine, but we can
> > > > > > catch
> > > > > > more problems if we check the RPM refcount. After that one drops
> > > > > > to
> > > > > > zero
> > > > > > we shouldn't access the HW any more, although the actual suspend
> > > > > > may be
> > > > > > delayed. The only complication is that we want to avoid asserts
> > > > > > while
> > > > > > the suspend handler itself is running, so add a flag to handle
> > > > > > this
> > > > > > case.
> > > > > 
> > > > > Why do we want to avoid asserts firing while we go through the
> > > > > suspend
> > > > > handler? Calling assert_device_not_suspended from within rpm
> > > > > suspend/resume code sounds like a bug. Where/why does this happen?
> > > > 
> > > > Yea, disable_rpm_asserts() is misnamed. Should be
> > > > disable_rpm_wakelock_asserts(). Will change that in the next
> > > > iteration.
> > > 
> > > Ok, misunderstood your question. assert_device_not_suspended() is
> > > called during runtime suspend since we're accessing the HW until the
> > > point we set dev_priv->pm.suspended = true. Atm this wouldn't trigger a
> > > WARN, since assert_device_not_suspended() only checks pm.suspended and
> > > that will check out fine, but once we start to check HW accesses
> > > against the actual RPM refcount we want to disable the asserts on those
> > > in the handlers, since there the refcount is zero. Hence disabling it
> > > explicitly around the handlers, but we would still keep checking
> > > pm.suspended.
> > 
> > That seems like we're mixing up 2 asserts:
> > - assert_device_not_suspended: To be used in runtime_suspend code.
> > - assert_holding_rpm_wakelock (or whatever, I'm bad at names): check the
> >   count.
> 
> We call this assert (atm assert_device_not_suspended()) from low level
> register access helpers, so we can't distinguish between calling one or
> the other assert depending on whether we are on the rpm suspend path or
> not. What this patch does is to switch all the places where call
> assert_device_not_suspended() to assert_rpm_wakelock_held(), since that
> one provides a bigger coverage. Since this change will also affect the
> low level reg access functions which are called during rpm suspend, we
> need to disable part of the assert that checks for the refcount which
> is known to be zero there.
> 
> Otherwise assert_rpm_wakelock_held() also includes
> assert_device_not_suspended(), since that should be true in all other
> cases.

Ok, that makes sense. Should be in the commit message ;-)

Instead of cooking our own, what about checking
pci_dev->base.power.runtim_status == PM_SUSPENDING plus a comment?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch