[PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend

Imre Deak imre.deak at intel.com
Fri Dec 20 16:20:06 UTC 2024


On Fri, Dec 20, 2024 at 09:55:04AM -0500, Rodrigo Vivi wrote:
> On Wed, Dec 18, 2024 at 04:33:09PM +0200, Imre Deak wrote:
> > On Tue, Dec 17, 2024 at 06:05:47PM -0500, Rodrigo Vivi wrote:
> > > We have some cases where display is releasing power domains at
> > > release_async_put_domains() where intel_runtime_pm_get_noresume()
> > > is called, but no outer protection. In Xe this will trigger our
> > > traditional warning.
> > 
> > I suppose by outer protection you mean an RPM reference that is
> > guaranteed to be held at the point (that is right before)
> > release_async_put_domains() calls intel_runtime_pm_get_noresume(). This
> > is guaranteed, i.e. such an RPM reference is held by definition (by the
> > power domain reference that is being put).
> 
> not actually.
> The outer rpm reference needs to be a reference on the outer bounds
> that ensures the device is awake. _noresume calls should only be used
> in inner places where you know there's something already ensuring
> that the device is awake but you don't want to take the risk of that
> reference being lost while you are in the middle of your sequence,
> so you call the 'noresume' as an extra thing to ensure that you can
> go to the end without device getting suspended because the other
> reference got dropped.

Yes, that is what I meant. In case of release_async_put_domains() it is
sure that the device is awake and hence there is no runtime resume
needed. The power domain reference being put holds a runtime PM
reference. So the "no outer protection" reasoning in the commit log is
not correct.

The reason for the WARN that this patch fixes is simply that
pm_runtime_get_if_in_use() used by xe to check for an outer RPM
reference fails if it is called either during runtime suspend/resume or
system suspend/resume. The existing code took this already into account
for the runtime suspend/resume case, but it didn't take it into account
for system suspend/resume. After this patch the outer protection check
will work the same way for both the runtime and system s/r case,
removing the WARN in the latter case.

> > Instead, the actual reason for triggering the warn - IIUC - is that
> > intel_runtime_pm_get_if_in_use() called from
> > xe_pm_runtime_get_noresume() (probably for the exact reason to check if
> > an outer RPM is held) fails if it is called while system suspending /
> > resuming. This is the same scenario as when
> > intel_runtime_pm_get_if_in_use() would fail if called during runtime
> > suspending / resuming and - worked around earlier I assume - by
> > suppressing the warning in this case using xe_pm_suspending_or_resuming().
> 
> The get_if_in_use is only the choice inside our _noresume so we can
> properly check if the device was really awake and warn that we have
> an unprotected case that we need to handle properly. If we were sure
> to have all the outer protections in place already, we could safely
> just use the _noresume option from the rpm directly.
>
> > So in this fix the above workaround to suppress the warning is just
> > extended to the system suspend/resume case.
> > 
> > > However, this case should be safe because it is triggered from the
> > > system suspend path, where we certainly won't be transitioning to rpm
> > > suspend.
> > > 
> > > This wouldn't happen if the display pm sequences, including
> > > all irq flow was in sync between i915 and xe. So, while we
> > > don't get there, let's not raise warnings when we are in this
> > > system suspend path.
> > 
> > I think the issue fixed in this patch is just a consequence of how the
> > outer RPM check works using xe_pm_suspending_or_resuming() and wouldn't
> > change even after the IRQ related issues are fixed.
> 
> If there's other cases where this release_async_put_domains is called
> out of the suspend path, this warning here is showing that we do
> need an extra runtime_pm_get right at the beginning of the workqueue.
> And this patch here would only be masking this warning in this case
> here, while leaving the release_async_put_domains unprotected.

Fixing the IRQ handling doesn't change how pm_runtime_get_if_in_use()
works and hence how its return value is ignored in the outer protection
check during runtime and system s/r.

> > > Suggested-by: Imre Deak <imre.deak at intel.com>
> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > 
> > With the above understanding:
> > Reviewed-by: Imre Deak <imre.deak at intel.com>
> > 
> > > ---
> > >  drivers/gpu/drm/xe/xe_pm.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > index a6761cb769b2..c6e57af0144c 100644
> > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > @@ -7,6 +7,7 @@
> > >  
> > >  #include <linux/fault-inject.h>
> > >  #include <linux/pm_runtime.h>
> > > +#include <linux/suspend.h>
> > >  
> > >  #include <drm/drm_managed.h>
> > >  #include <drm/ttm/ttm_placement.h>
> > > @@ -607,7 +608,8 @@ static bool xe_pm_suspending_or_resuming(struct xe_device *xe)
> > >  	struct device *dev = xe->drm.dev;
> > >  
> > >  	return dev->power.runtime_status == RPM_SUSPENDING ||
> > > -		dev->power.runtime_status == RPM_RESUMING;
> > > +		dev->power.runtime_status == RPM_RESUMING ||
> > > +		pm_suspend_target_state != PM_SUSPEND_ON;
> > >  #else
> > >  	return false;
> > >  #endif
> > > -- 
> > > 2.47.1
> > > 


More information about the Intel-xe mailing list