[PATCH v3] drm/xe/pm: Disable RPM for SR-IOV VFs with no PCIe PM capability]

Mon Aug 4 16:54:22 UTC 2025

On Mon, Aug 04, 2025 at 06:20:57PM +0200, Michał Winiarski wrote:
> On Mon, Aug 04, 2025 at 11:11:39AM -0400, Rodrigo Vivi wrote:
> > On Mon, Aug 04, 2025 at 08:30:12PM +0530, K V P, Satyanarayana wrote:
> > > 
> > > 
> > > On 04-08-2025 20:21, Rodrigo Vivi wrote:
> > > > On Fri, Aug 01, 2025 at 10:00:17PM +0530, Satyanarayana K V P wrote:
> > > > > Enable Runtime Power Management (RPM) for PCI Express devices by utilizing
> > > > > their native Power Management (PM) capabilities. The specification (as per
> > > > > section 5.10.1 in PCI Express® Base Specification Revision 7.0) mandates
> > > > > that Virtual Functions (VFs) without Power Management capability inherit
> > > > > their associated Physical Function's (PF) power state.
> > > > > 
> > > > > As per PCIe spec "If a VF does not implement the PCI Power Management
> > > > > Capability, then the VF behaves as if it had been programmed into the
> > > > > equivalent power state of its associated PF"
> > > > > 
> > > > > Since Intel GPU VFs lack PM capability implementations, VFs power behavior
> > > > 
> > > > If this is a true statement your patch below is wrong.
> > > > It should simply be if (IS_VF) return;
> > > > 
> > > The current GPU VFs are lacking PM capability. But in future, if VFs (new
> > > generation GPUs) support PM capability, then VF should be able to manage PM
> > > on its own. As per spec, if VF does not implement PCI PM capability, it
> > > should follow PF power states. If implemented, VF should manage PM directly.
> > > 
> > > So, checking for both pm_cap and IS_VF.
> > 
> > So the commit message should be written in a way to explain that possibility.
> > 
> > But anyway, please step back for a moment and think about the whole flow
> > itself and the reasoning behind pm_cap.
> > 
> > pci subsystem is the one already checking the pm_caps before doing the
> > pci pm transitions. If the check is not there, please go there and add it
> > to the pci subsystem.
> > 
> > From our side, if we need to avoid that VF follows the PF and goes to sleep,
> > then the PF needs to get an extra refcount for the whole duration of the VF,
> > so PF won't be allowed to sleep then VF will also not sleep.
> 
> PF is already taking the refcount, from PCI perspective, VF device is
> never going into low-power state.
> 
> Main thing to keep in mind here, is that runtime_pm is decoupled from
> PCI subsystem. It's the drivers responsibility to actually trigger PCI
> PM transition into low-power state as part of runtime suspend handling.
> Usually that's one of the last steps, and before that, the driver needs
> to make sure that the state of all the resources is not "lost" after
> moving into low-power state, and the driver is able to recover that
> state as part of runtime resume.
> 
> For VF that doesn't implement pm_caps - that last step, which is moving
> into low-power state is a no-op.
> In other words - for VF w/o PM caps, with the current driver codebase
> we're going through all the driver-specific extra work in suspend/resume
> handlers to end up doing nothing in the end.
> 
> Perhaps the commit message is not expressing that clearly enough, but
> the reasoning behind the patch is simple - if PCI PM transition is a
> no-op, don't do any extra work (by not using runtime PM).
> 
> Does that sound reasonable?

Finally a reasonable explanation, thank you!

Satyanarayana, please make sure that this is pretty clear in the commit
message, and also use if (IS_VF() && !pm_cap) instead of the inverted negative or.

Also worth a comment in the code as well right before the if:

/* For VF that doesn't implement pm_caps, the rpm functions are no-op */

Thanks,
Rodrigo.

> 
> Thanks,
> -Michał
> 
> > 
> > > 
> > > > > must mirror their PF's state. During VF creation, the PF remains active
> > > > > from the PCI subsystem perspective. To maintain consistency, explicitly
> > > > > disable RPM for VFs missing PM capability to ensure they follow their PF's
> > > > > power management status rather than entering low-power states
> > > > > independently.
> > > > > 
> > > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > > > > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > > > > Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > 
> > > > > ---
> > > > > V2 -> V3:
> > > > > - Fixed review comments (Michal Wajdeczko).
> > > > > 
> > > > > V1 -> V2:
> > > > > - Disable RPM only for VF devices when PM cap is not implemented.
> > > > > ---
> > > > >   drivers/gpu/drm/xe/xe_pm.c | 14 ++++++++++++++
> > > > >   1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > > index 8aae66660931..019de04fa7bb 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > > @@ -246,8 +246,18 @@ static bool xe_pm_pci_d3cold_capable(struct xe_device *xe)
> > > > >   static void xe_pm_runtime_init(struct xe_device *xe)
> > > > >   {
> > > > > +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > > > >   	struct device *dev = xe->drm.dev;
> > > > > +	/*
> > > > > +	 * Disable RPM for VFs if PM cap is not supported.
> > > > > +	 * As per PCIe specification, "If a VF does not implement the PCI Power
> > > > > +	 * Management Capability, then the VF behaves as if it had been programmed
> > > > > +	 * into the equivalent power state of its associated PF"
> > > > > +	 */
> > > > > +	if (!(pdev->pm_cap || !IS_SRIOV_VF(xe)))
> > > > > +		return;
> > > > > +
> > > > >   	/*
> > > > >   	 * Disable the system suspend direct complete optimization.
> > > > >   	 * We need to ensure that the regular device suspend/resume functions
> > > > > @@ -366,8 +376,12 @@ int xe_pm_init(struct xe_device *xe)
> > > > >   static void xe_pm_runtime_fini(struct xe_device *xe)
> > > > >   {
> > > > > +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > > > >   	struct device *dev = xe->drm.dev;
> > > > > +	if (!(pdev->pm_cap || !IS_SRIOV_VF(xe)))
> > > > > +		return;
> > > > > +
> > > > >   	pm_runtime_get_sync(dev);
> > > > >   	pm_runtime_forbid(dev);
> > > > >   }
> > > > > -- 
> > > > > 2.43.0
> > > > > 
> > >