[Intel-xe] [PATCH 1/2] drm/xe/pm: Wrap pm_runtime_suspended with power.lock

Gupta, Anshuman anshuman.gupta at intel.com
Wed Jun 21 07:53:47 UTC 2023



> -----Original Message-----
> From: Matthew Auld <matthew.william.auld at gmail.com>
> Sent: Monday, June 19, 2023 6:03 PM
> To: Gupta, Anshuman <anshuman.gupta at intel.com>
> Cc: Brost, Matthew <matthew.brost at intel.com>; Sundaresan, Sujaritha
> <sujaritha.sundaresan at intel.com>; intel-xe at lists.freedesktop.org; Vivi,
> Rodrigo <rodrigo.vivi at intel.com>
> Subject: Re: [Intel-xe] [PATCH 1/2] drm/xe/pm: Wrap
> pm_runtime_suspended with power.lock
> 
> On Mon, 19 Jun 2023 at 12:18, Gupta, Anshuman
> <anshuman.gupta at intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost at intel.com>
> > > Sent: Saturday, June 17, 2023 1:35 AM
> > > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org; Nilawar, Badal
> > > <badal.nilawar at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>;
> > > Tauro, Riana <riana.tauro at intel.com>; Sundaresan, Sujaritha
> > > <sujaritha.sundaresan at intel.com>
> > > Subject: Re: [PATCH 1/2] drm/xe/pm: Wrap pm_runtime_suspended
> with
> > > power.lock
> > >
> > > On Fri, Jun 16, 2023 at 07:41:14PM +0530, Anshuman Gupta wrote:
> > > > pm_runtime_suspended() doc has requirement to use it with PM lock
> > > > of @dev to get the trusted value.
> > > > Wrap pm_runtime_suspended within xe_pm_runtime_suspended()
> with
> > > proper
> > > > locking.
> > > >
> > > > Signed-off-by: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_pm.c | 16 ++++++++++++++--
> > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > > > b/drivers/gpu/drm/xe/xe_pm.c index b7b57f10ba25..04c5353ba870
> > > > 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > @@ -210,11 +210,23 @@ int xe_pm_runtime_put(struct xe_device
> *xe)
> > > >     return pm_runtime_put_autosuspend(xe->drm.dev);
> > > >  }
> > > >
> > > > +static bool xe_pm_runtime_suspended(struct xe_device *xe) {
> > > > +   unsigned long flags;
> > > > +   bool suspended;
> > > > +
> > > > +   spin_lock_irqsave(&xe->drm.dev->power.lock, flags);
> > > > +   suspended = pm_runtime_suspended(xe->drm.dev);
> > > > +   spin_unlock_irqrestore(&xe->drm.dev->power.lock, flags);
> > >
> > > A few things.
> > >
> > > First, you probably can make this spin_lock_irq / spin_unlock_irq as
> > > IRQ contexts are not really used in Xe it is doubtful this function
> > > would ever be called from an unknown context.
> > Thanks  for review , I will fix this.
> > >
> > > Second, I have questions if this patch is actually doing anything
> > > useful. Yes, I can see doc which indicates that pm_runtime_suspended
> > > can only be trusted if the lock is held but looking at the code
> > > paths where xe_pm_runtime_suspended is called unsure how this lock
> helps.
> > >
> > > e.g. How is this better?
> > This is helpful in case, when pm_runtime_suspended() returns status is
> > RPM_SUSPENDING and it will be going to change to RPM_SUPENDED,
> > rpm_suspend() changes the runtime status from RPM_ACTIVE ->
> > RPM_SUSPENDING -> driver->runtime_suspend() callback ->
> RPM_SUSPENDED
> > https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runt
> > ime.c#L558
> > rpm_suspend() caller has already grabbed the lock and RPM status
> changed to RPM_SUSPENDING, therefore Xe has to wait to grab the lock to
> get the trusted value, which is RPM_SUSPENDED.
> > Without a proper lock, the xe driver may access the vram while the
> > device was runtime suspended, which is a violation of  PCIe specs.> if
> > (trusted value)
> > >       now the trusted value can change, do something
> > The other way around is if Xe has grabbed the lock first then the
> > possible runtime status is either RPM_ACTIVE or RUNTIME_SUPENDED  for
> both Xe is increasing usage ref count explicitly to make sure the device is in
> an active state.
> 
> From my reading of the RPM core code, it eventually drops the power.lock
> before calling any of the driver pm callbacks (see
> __rpm_callback() for example), and then only re-acquires it after. I think it
> mostly just protects the RPM state. If something is transitioning from
> RPM_ACTIVE -> RPM_SUSPENDING or whatever, grabbing the power.lock
> won't synchronise the actual operation, since the lock is not even being held
> when our callback is being run. If we want to actually synchronise an ongoing
> SUSPENDING or RESUMING you have to call something like
> pm_runtime_get_sync() I think.
All rpm_{idle,resume,suspend} PM core calls being called with power.lock held. 
See below.
https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L1140
https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L1158
https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L1102

So during driver callback as well, power lock will be held.

We can not use pm_runtime_get_sync() in the the runtime suspend call path as it will cause a deadlock.
See xe_device_mem_access_get() is even get called from runtime supend call patch as well.
This seems intentional to reuse the code.
> 
> Also like Matt said, even if this somehow all worked there is still the actual
> race window between checking if it is not suspended and then proceeding as
> if that is still going to be the case.
> 
> FWIW there is an attempt to fix such issues here, although it's all quite nasty:
> https://patchwork.freedesktop.org/series/119213/
Thanks for pointing the series, will take look in  to it.

Thanks,
Anshuman Gupta.

> 
> >
> > Thanks ,
> > Anshuman Gupta.
> > >
> > >
> > > Than?
> > > if (untrusted value)
> > >       do something
> > > Also if you look all over kernel pm_runtime_suspended is called
> > > frequently
> > > dev->power.lock.
> > >
> > > Matt
> > >
> > > > +
> > > > +   return suspended;
> > > > +}
> > > > +
> > > >  /* Return true if resume operation happened and usage count was
> > > > increased */  bool xe_pm_runtime_resume_if_suspended(struct
> > > xe_device
> > > > *xe)  {
> > > >     /* In case we are suspended we need to immediately wake up */
> > > > -   if (pm_runtime_suspended(xe->drm.dev))
> > > > +   if (xe_pm_runtime_suspended(xe))
> > > >             return !pm_runtime_resume_and_get(xe->drm.dev);
> > > >
> > > >     return false;
> > > > @@ -222,6 +234,6 @@ bool
> xe_pm_runtime_resume_if_suspended(struct
> > > > xe_device *xe)
> > > >
> > > >  int xe_pm_runtime_get_if_active(struct xe_device *xe)  {
> > > > -   WARN_ON(pm_runtime_suspended(xe->drm.dev));
> > > > +   WARN_ON(xe_pm_runtime_suspended(xe));
> > > >     return pm_runtime_get_if_active(xe->drm.dev, true);  }
> > > > --
> > > > 2.38.0
> > > >


More information about the Intel-xe mailing list