[Intel-xe] [PATCH 1/2] drm/xe/pm: Wrap pm_runtime_suspended with power.lock

Matthew Auld matthew.william.auld at gmail.com
Mon Jun 19 12:32:33 UTC 2023


On Mon, 19 Jun 2023 at 12:18, Gupta, Anshuman <anshuman.gupta at intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost at intel.com>
> > Sent: Saturday, June 17, 2023 1:35 AM
> > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > Cc: intel-xe at lists.freedesktop.org; Nilawar, Badal
> > <badal.nilawar at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>; Tauro,
> > Riana <riana.tauro at intel.com>; Sundaresan, Sujaritha
> > <sujaritha.sundaresan at intel.com>
> > Subject: Re: [PATCH 1/2] drm/xe/pm: Wrap pm_runtime_suspended with
> > power.lock
> >
> > On Fri, Jun 16, 2023 at 07:41:14PM +0530, Anshuman Gupta wrote:
> > > pm_runtime_suspended() doc has requirement to use it with PM lock of
> > > @dev to get the trusted value.
> > > Wrap pm_runtime_suspended within xe_pm_runtime_suspended() with
> > proper
> > > locking.
> > >
> > > Signed-off-by: Anshuman Gupta <anshuman.gupta at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_pm.c | 16 ++++++++++++++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > index b7b57f10ba25..04c5353ba870 100644
> > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > @@ -210,11 +210,23 @@ int xe_pm_runtime_put(struct xe_device *xe)
> > >     return pm_runtime_put_autosuspend(xe->drm.dev);
> > >  }
> > >
> > > +static bool xe_pm_runtime_suspended(struct xe_device *xe) {
> > > +   unsigned long flags;
> > > +   bool suspended;
> > > +
> > > +   spin_lock_irqsave(&xe->drm.dev->power.lock, flags);
> > > +   suspended = pm_runtime_suspended(xe->drm.dev);
> > > +   spin_unlock_irqrestore(&xe->drm.dev->power.lock, flags);
> >
> > A few things.
> >
> > First, you probably can make this spin_lock_irq / spin_unlock_irq as IRQ
> > contexts are not really used in Xe it is doubtful this function would ever be
> > called from an unknown context.
> Thanks  for review , I will fix this.
> >
> > Second, I have questions if this patch is actually doing anything useful. Yes, I
> > can see doc which indicates that pm_runtime_suspended can only be
> > trusted if the lock is held but looking at the code paths where
> > xe_pm_runtime_suspended is called unsure how this lock helps.
> >
> > e.g. How is this better?
> This is helpful in case, when pm_runtime_suspended() returns status is RPM_SUSPENDING and it will be going to change to RPM_SUPENDED,
> rpm_suspend() changes the runtime status from RPM_ACTIVE -> RPM_SUSPENDING -> driver->runtime_suspend() callback -> RPM_SUSPENDED
> https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L558
> rpm_suspend() caller has already grabbed the lock and RPM status changed to RPM_SUSPENDING, therefore Xe has to wait to grab the lock to get the trusted value, which is RPM_SUSPENDED.
> Without a proper lock, the xe driver may access the vram while the device was runtime suspended, which is a violation of  PCIe specs.> if (trusted value)
> >       now the trusted value can change, do something
> The other way around is if Xe has grabbed the lock first then the possible runtime status is either RPM_ACTIVE or RUNTIME_SUPENDED  for both Xe is increasing usage
> ref count explicitly to make sure the device is in an active state.

>From my reading of the RPM core code, it eventually drops the
power.lock before calling any of the driver pm callbacks (see
__rpm_callback() for example), and then only re-acquires it after. I
think it mostly just protects the RPM state. If something is
transitioning from RPM_ACTIVE -> RPM_SUSPENDING or whatever, grabbing
the power.lock won't synchronise the actual operation, since the lock
is not even being held when our callback is being run. If we want to
actually synchronise an ongoing SUSPENDING or RESUMING you have to
call something like pm_runtime_get_sync() I think.

Also like Matt said, even if this somehow all worked there is still
the actual race window between checking if it is not suspended and
then proceeding as if that is still going to be the case.

FWIW there is an attempt to fix such issues here, although it's all
quite nasty: https://patchwork.freedesktop.org/series/119213/

>
> Thanks ,
> Anshuman Gupta.
> >
> >
> > Than?
> > if (untrusted value)
> >       do something
> > Also if you look all over kernel pm_runtime_suspended is called frequently
> > dev->power.lock.
> >
> > Matt
> >
> > > +
> > > +   return suspended;
> > > +}
> > > +
> > >  /* Return true if resume operation happened and usage count was
> > > increased */  bool xe_pm_runtime_resume_if_suspended(struct
> > xe_device
> > > *xe)  {
> > >     /* In case we are suspended we need to immediately wake up */
> > > -   if (pm_runtime_suspended(xe->drm.dev))
> > > +   if (xe_pm_runtime_suspended(xe))
> > >             return !pm_runtime_resume_and_get(xe->drm.dev);
> > >
> > >     return false;
> > > @@ -222,6 +234,6 @@ bool xe_pm_runtime_resume_if_suspended(struct
> > > xe_device *xe)
> > >
> > >  int xe_pm_runtime_get_if_active(struct xe_device *xe)  {
> > > -   WARN_ON(pm_runtime_suspended(xe->drm.dev));
> > > +   WARN_ON(xe_pm_runtime_suspended(xe));
> > >     return pm_runtime_get_if_active(xe->drm.dev, true);  }
> > > --
> > > 2.38.0
> > >


More information about the Intel-xe mailing list