[PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Dec 20 19:21:17 UTC 2024
On Fri, Dec 20, 2024 at 09:15:17PM +0200, Imre Deak wrote:
> On Fri, Dec 20, 2024 at 01:34:01PM -0500, Rodrigo Vivi wrote:
> > On Fri, Dec 20, 2024 at 06:20:06PM +0200, Imre Deak wrote:
> > > On Fri, Dec 20, 2024 at 09:55:04AM -0500, Rodrigo Vivi wrote:
> > > > On Wed, Dec 18, 2024 at 04:33:09PM +0200, Imre Deak wrote:
> > > > > On Tue, Dec 17, 2024 at 06:05:47PM -0500, Rodrigo Vivi wrote:
> > > > > > We have some cases where display is releasing power domains at
> > > > > > release_async_put_domains() where intel_runtime_pm_get_noresume()
> > > > > > is called, but no outer protection. In Xe this will trigger our
> > > > > > traditional warning.
> > > > >
> > > > > I suppose by outer protection you mean an RPM reference that is
> > > > > guaranteed to be held at the point (that is right before)
> > > > > release_async_put_domains() calls intel_runtime_pm_get_noresume(). This
> > > > > is guaranteed, i.e. such an RPM reference is held by definition (by the
> > > > > power domain reference that is being put).
> > > >
> > > > not actually.
> > > > The outer rpm reference needs to be a reference on the outer bounds
> > > > that ensures the device is awake. _noresume calls should only be used
> > > > in inner places where you know there's something already ensuring
> > > > that the device is awake but you don't want to take the risk of that
> > > > reference being lost while you are in the middle of your sequence,
> > > > so you call the 'noresume' as an extra thing to ensure that you can
> > > > go to the end without device getting suspended because the other
> > > > reference got dropped.
> > >
> > > Yes, that is what I meant. In case of release_async_put_domains() it is
> > > sure that the device is awake and hence there is no runtime resume
> > > needed. The power domain reference being put holds a runtime PM
> > > reference. So the "no outer protection" reasoning in the commit log is
> > > not correct.
> > >
> > > The reason for the WARN that this patch fixes is simply that
> > > pm_runtime_get_if_in_use() used by xe to check for an outer RPM
> > > reference fails if it is called either during runtime suspend/resume or
> > > system suspend/resume. The existing code took this already into account
> > > for the runtime suspend/resume case, but it didn't take it into account
> > > for system suspend/resume. After this patch the outer protection check
> > > will work the same way for both the runtime and system s/r case,
> > > removing the WARN in the latter case.
> >
> > great then, we are in the same page.
>
> I don't agree with the exaplanation of the commit log, it should be
> something like the following:
>
> """
> Fix the false-positive "Missing outer runtime PM protection" warning
> triggered by
> release_async_domains() -> intel_runtime_pm_get_noresume() ->
> xe_pm_runtime_get_noresume()
> during system suspend.
>
> xe_pm_runtime_get_noresume() is supposed to warn if the device is not in
> the runtime resumed state, using xe_pm_runtime_get_if_in_use() for this.
> However the latter function will fail if called during runtime or system
> suspend/resume, regardless of whether the device is runtime resumed or
> not.
>
> Based on the above suppress the warning during system suspend/resume,
> similarly to how this is done during runtime suspend/resume.
> """
>
> If still possible, would be better to amend the commit log based on the
> above.
Indeed much better... Fixed, thank you
>
> Thanks.
>
> > > > > Instead, the actual reason for triggering the warn - IIUC - is that
> > > > > intel_runtime_pm_get_if_in_use() called from
> > > > > xe_pm_runtime_get_noresume() (probably for the exact reason to check if
> > > > > an outer RPM is held) fails if it is called while system suspending /
> > > > > resuming. This is the same scenario as when
> > > > > intel_runtime_pm_get_if_in_use() would fail if called during runtime
> > > > > suspending / resuming and - worked around earlier I assume - by
> > > > > suppressing the warning in this case using xe_pm_suspending_or_resuming().
> > > >
> > > > The get_if_in_use is only the choice inside our _noresume so we can
> > > > properly check if the device was really awake and warn that we have
> > > > an unprotected case that we need to handle properly. If we were sure
> > > > to have all the outer protections in place already, we could safely
> > > > just use the _noresume option from the rpm directly.
> > > >
> > > > > So in this fix the above workaround to suppress the warning is just
> > > > > extended to the system suspend/resume case.
> > > > >
> > > > > > However, this case should be safe because it is triggered from the
> > > > > > system suspend path, where we certainly won't be transitioning to rpm
> > > > > > suspend.
> > > > > >
> > > > > > This wouldn't happen if the display pm sequences, including
> > > > > > all irq flow was in sync between i915 and xe. So, while we
> > > > > > don't get there, let's not raise warnings when we are in this
> > > > > > system suspend path.
> > > > >
> > > > > I think the issue fixed in this patch is just a consequence of how the
> > > > > outer RPM check works using xe_pm_suspending_or_resuming() and wouldn't
> > > > > change even after the IRQ related issues are fixed.
> > > >
> > > > If there's other cases where this release_async_put_domains is called
> > > > out of the suspend path, this warning here is showing that we do
> > > > need an extra runtime_pm_get right at the beginning of the workqueue.
> > > > And this patch here would only be masking this warning in this case
> > > > here, while leaving the release_async_put_domains unprotected.
> > >
> > > Fixing the IRQ handling doesn't change how pm_runtime_get_if_in_use()
> > > works and hence how its return value is ignored in the outer protection
> > > check during runtime and system s/r.
> >
> > indeed!
> >
> > so, pushed to drm-xe-next.
> > Thank you so much for the suggestion, review and insights here
> >
> > >
> > > > > > Suggested-by: Imre Deak <imre.deak at intel.com>
> > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > >
> > > > > With the above understanding:
> > > > > Reviewed-by: Imre Deak <imre.deak at intel.com>
> > > > >
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_pm.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > > > index a6761cb769b2..c6e57af0144c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > > > @@ -7,6 +7,7 @@
> > > > > >
> > > > > > #include <linux/fault-inject.h>
> > > > > > #include <linux/pm_runtime.h>
> > > > > > +#include <linux/suspend.h>
> > > > > >
> > > > > > #include <drm/drm_managed.h>
> > > > > > #include <drm/ttm/ttm_placement.h>
> > > > > > @@ -607,7 +608,8 @@ static bool xe_pm_suspending_or_resuming(struct xe_device *xe)
> > > > > > struct device *dev = xe->drm.dev;
> > > > > >
> > > > > > return dev->power.runtime_status == RPM_SUSPENDING ||
> > > > > > - dev->power.runtime_status == RPM_RESUMING;
> > > > > > + dev->power.runtime_status == RPM_RESUMING ||
> > > > > > + pm_suspend_target_state != PM_SUSPEND_ON;
> > > > > > #else
> > > > > > return false;
> > > > > > #endif
> > > > > > --
> > > > > > 2.47.1
> > > > > >
More information about the Intel-xe
mailing list