[PATCH] drm/xe/d3cold: Set power state to D3Cold during s2idle/s3
Raag Jadav
raag.jadav at intel.com
Wed Apr 2 07:31:11 UTC 2025
On Tue, Apr 01, 2025 at 09:35:49PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 1, 2025 at 7:53 PM Raag Jadav <raag.jadav at intel.com> wrote:
...
> > That's not what I meant here. There are multiple issues at play.
> >
> > 1. An AER is reported[*] on root port during system suspend even before we
> > reach any of the driver PM callback. From initial investigation it seems
> > like a case of misbehaviour by some child device, but this is a different
> > issue entirely.
> >
> > [*] irq/120-aerdrv-145 [002] ..... 1264.981023: <stack trace>
> > => xe_pm_runtime_resume
> > => xe_pci_runtime_resume
> > => pci_pm_runtime_resume
> > => __rpm_callback
> > => rpm_callback
> > => rpm_resume
> > => __pm_runtime_resume
> > => pci_pm_runtime_get_sync
> > => __pci_walk_bus
> > => pci_walk_bus
> > => pcie_do_recovery
> > => aer_process_err_devices
> > => aer_isr
> >
> > 2. Setting explicit pci_set_power_state(pdev, PCI_D3cold) from xe_pm_suspend().
> > Although we see many drivers do it for their case, it's quite a questionable
> > choice (atleast IMHO) to hard suspend the device from driver PM callback
> > without any regard to runtime_usage counter. It can hide potential issues
> > like AER during system suspend (regardless of whether or not it is supported
> > by the driver, since it is supposed to keep the device active on such a
> > catastrophic failure anyway), but I'll leave it to the experts to decide.
>
> If the driver does not set DPM_FLAG_SMART_SUSPEND, and xe doesn't set
> it AFAICS (at least not in the mainline), pci_pm_suspend() will resume
> the device from runtime suspend before invoking its driver's callback.
>
> This guarantees that the device is always RPM_ACTIVE when
> xe_pci_suspend() runs and it cannot be runtime-suspended because the
> core is holding a runtime PM reference on it (and so the runtime PM
> usage counter is always greater than zero when xe_pci_suspend() runs).
>
> This means that neither xe_pci_runtime_suspend() nor
> xe_pci_runtime_resume() can run concurrently with respect to it, so
> xe_pci_suspend() can change the power state of the device etc. safely.
Ah, I failed to notice that __pm_runtime_resume() is taking a spin_lock_irqsave().
Thanks for clearing this up.
> Of course, it needs to call pci_save_state() before doing any of that,
> but it does so.
>
> This is expected to work, but resuming the device from runtime suspend
> during a system suspend transition can be avoided (see below).
>
> However, on the resume side, pci_pm_resume_noirq() runs first and
> because dev_pm_skip_resume() returns 'false' (since
> DPM_FLAG_SMART_SUSPEND is unset) and skip_bus_pm is 'false' (since the
> device is not in D0), it will call pci_pm_default_resume_early() which
> will (1) put the device into D0 unconditionally and (2) restore its
> state.
>
> This means that by the time when xe_pci_resume() runs, the power state
> of the device is already D0 and its state has already been restored,
> so the initial part of it is arguably redundant.
>
> There also appears to be a problem with calling pci_disable_device()
> after pci_save_state() in xe_pci_suspend() because restoring the
> device's state in pci_pm_default_resume_early() will inadvertently
> cause it to be enabled AFAICS. I would call pci_save_state() after
> pci_disable_device(), so the device would still be disabled after
> running pci_pm_default_resume_early() and xe_pci_resume() would enable
> it as appropriate.
>
> > 2. Since we're already setting D0 and D3 states in our runtime PM callbacks,
> > utilizing DPM_FLAG_SMART_* flags will allow us to skip unnecessary
> > runtime_resume during system suspend and let PCI PM take care of all
> > the PCI stuff in a smarter way (even skip D3hot/cold if it's already in it)
> > without us needing to duplicate code and do everything manually.
>
> That's true in principle, but care needs to be taken.
>
> First of all, DPM_FLAG_SMART_PREPARE and DPM_FLAG_SMART_SUSPEND are
> about different things.
>
> DPM_FLAG_SMART_PREPARE is about the direct-complete optimization that
> allows the suspend and resume of the device to be skipped completely
> if some additional conditions are met and it allows the driver to
> control the behavior related to this optimization (see "Entering
> System Suspend" in Documentation/driver-api/pm/devices.rst for
> details). The bottom line here is that because xe doesn't provide a
> ->prepare() callback in xe_pm_ops, this flag will have no effect. xe
> may want to set DPM_FLAG_NO_DIRECT_COMPLETE to disable the
> direct-complete optimization completely, though (i915 does this for
> some reason unclear to me).
>
> DPM_FLAG_SMART_SUSPEND is about leaving the device in runtime suspend
> across the entire system suspend transition if it is still
> runtime-suspended after running its driver's ->suspend() callback.
>
> This means in particular that if DPM_FLAG_SMART_SUSPEND is set, the
> ->suspend() callback may run in parallel with xe_pci_runtime_resume(),
> so it cannot be the current xe_pci_suspend() because it cannot
> manipulate the device or call pci_save_state() on it or similar. The
> most straightforward way to overcome this would be to point
> ->suspend_late() to xe_pci_suspend(), but this would cause the device
> to be suspended later unless it was runtime-suspended already. In
> turn, suspending it later may cause the total duration of the system
> suspend transition to increase (because the device will now be
> suspended in a different phase of the transition).
>
> The suspend time can be distributed somewhat differently if the core
> is allowed to take care of putting the device into a low-power state.
> Simply removing the pci_save_state() and pci_set_power_state() calls
> from xe_pci_suspend() should be sufficient to make this happen and
> that would be my first step (after fixing the device disable/enable
> issue described above).
>
> Next, you can point ->suspend_late(), instead of ->suspend(), to
> xe_pci_suspend() and that is still expected to work, but the total
> system suspend timing will change.
>
> If this works, you'll be ready to set DPM_FLAG_SMART_SUSPEND.
Excellent. While I did some experimentation of my own, this definitely makes
things more clear. Thank you for your detailed feedback.
I'm still trying to wrap my head around the total suspend duration argument.
Do you mean the time taken by xe_pci_suspend() itself can change with
->suspend_late()? Or are there any other side-effects to it?
Raag
More information about the Intel-xe
mailing list