[PATCH v3] drm/xe/pm: Disable RPM for SR-IOV VFs with no PCIe PM capability]

Mon Aug 4 16:20:57 UTC 2025

On Mon, Aug 04, 2025 at 11:11:39AM -0400, Rodrigo Vivi wrote:
> On Mon, Aug 04, 2025 at 08:30:12PM +0530, K V P, Satyanarayana wrote:
> > 
> > 
> > On 04-08-2025 20:21, Rodrigo Vivi wrote:
> > > On Fri, Aug 01, 2025 at 10:00:17PM +0530, Satyanarayana K V P wrote:
> > > > Enable Runtime Power Management (RPM) for PCI Express devices by utilizing
> > > > their native Power Management (PM) capabilities. The specification (as per
> > > > section 5.10.1 in PCI Express® Base Specification Revision 7.0) mandates
> > > > that Virtual Functions (VFs) without Power Management capability inherit
> > > > their associated Physical Function's (PF) power state.
> > > > 
> > > > As per PCIe spec "If a VF does not implement the PCI Power Management
> > > > Capability, then the VF behaves as if it had been programmed into the
> > > > equivalent power state of its associated PF"
> > > > 
> > > > Since Intel GPU VFs lack PM capability implementations, VFs power behavior
> > > 
> > > If this is a true statement your patch below is wrong.
> > > It should simply be if (IS_VF) return;
> > > 
> > The current GPU VFs are lacking PM capability. But in future, if VFs (new
> > generation GPUs) support PM capability, then VF should be able to manage PM
> > on its own. As per spec, if VF does not implement PCI PM capability, it
> > should follow PF power states. If implemented, VF should manage PM directly.
> > 
> > So, checking for both pm_cap and IS_VF.
> 
> So the commit message should be written in a way to explain that possibility.
> 
> But anyway, please step back for a moment and think about the whole flow
> itself and the reasoning behind pm_cap.
> 
> pci subsystem is the one already checking the pm_caps before doing the
> pci pm transitions. If the check is not there, please go there and add it
> to the pci subsystem.
> 
> From our side, if we need to avoid that VF follows the PF and goes to sleep,
> then the PF needs to get an extra refcount for the whole duration of the VF,
> so PF won't be allowed to sleep then VF will also not sleep.

PF is already taking the refcount, from PCI perspective, VF device is
never going into low-power state.

Main thing to keep in mind here, is that runtime_pm is decoupled from
PCI subsystem. It's the drivers responsibility to actually trigger PCI
PM transition into low-power state as part of runtime suspend handling.
Usually that's one of the last steps, and before that, the driver needs
to make sure that the state of all the resources is not "lost" after
moving into low-power state, and the driver is able to recover that
state as part of runtime resume.

For VF that doesn't implement pm_caps - that last step, which is moving
into low-power state is a no-op.
In other words - for VF w/o PM caps, with the current driver codebase
we're going through all the driver-specific extra work in suspend/resume
handlers to end up doing nothing in the end.

Perhaps the commit message is not expressing that clearly enough, but
the reasoning behind the patch is simple - if PCI PM transition is a
no-op, don't do any extra work (by not using runtime PM).

Does that sound reasonable?

Thanks,
-Michał

> 
> > 
> > > > must mirror their PF's state. During VF creation, the PF remains active
> > > > from the PCI subsystem perspective. To maintain consistency, explicitly
> > > > disable RPM for VFs missing PM capability to ensure they follow their PF's
> > > > power management status rather than entering low-power states
> > > > independently.
> > > > 
> > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > > > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > > > Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > 
> > > > ---
> > > > V2 -> V3:
> > > > - Fixed review comments (Michal Wajdeczko).
> > > > 
> > > > V1 -> V2:
> > > > - Disable RPM only for VF devices when PM cap is not implemented.
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_pm.c | 14 ++++++++++++++
> > > >   1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > index 8aae66660931..019de04fa7bb 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > @@ -246,8 +246,18 @@ static bool xe_pm_pci_d3cold_capable(struct xe_device *xe)
> > > >   static void xe_pm_runtime_init(struct xe_device *xe)
> > > >   {
> > > > +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > > >   	struct device *dev = xe->drm.dev;
> > > > +	/*
> > > > +	 * Disable RPM for VFs if PM cap is not supported.
> > > > +	 * As per PCIe specification, "If a VF does not implement the PCI Power
> > > > +	 * Management Capability, then the VF behaves as if it had been programmed
> > > > +	 * into the equivalent power state of its associated PF"
> > > > +	 */
> > > > +	if (!(pdev->pm_cap || !IS_SRIOV_VF(xe)))
> > > > +		return;
> > > > +
> > > >   	/*
> > > >   	 * Disable the system suspend direct complete optimization.
> > > >   	 * We need to ensure that the regular device suspend/resume functions
> > > > @@ -366,8 +376,12 @@ int xe_pm_init(struct xe_device *xe)
> > > >   static void xe_pm_runtime_fini(struct xe_device *xe)
> > > >   {
> > > > +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > > >   	struct device *dev = xe->drm.dev;
> > > > +	if (!(pdev->pm_cap || !IS_SRIOV_VF(xe)))
> > > > +		return;
> > > > +
> > > >   	pm_runtime_get_sync(dev);
> > > >   	pm_runtime_forbid(dev);
> > > >   }
> > > > -- 
> > > > 2.43.0
> > > > 
> >