[PATCH] drm/xe: Restore pci state upon resume
Rodrigo Vivi
rodrigo.vivi at intel.com
Thu Sep 19 22:10:35 UTC 2024
On Wed, Sep 18, 2024 at 12:09:40AM +0300, Ville Syrjälä wrote:
> On Tue, Sep 17, 2024 at 02:49:37PM -0400, Rodrigo Vivi wrote:
> > On Fri, Sep 13, 2024 at 07:54:34PM +0300, Ville Syrjälä wrote:
> > > On Fri, Sep 13, 2024 at 11:43:52AM -0400, Rodrigo Vivi wrote:
> > > > On Fri, Sep 13, 2024 at 02:01:49PM +0300, Ville Syrjälä wrote:
> > > > > On Thu, Sep 12, 2024 at 03:05:30PM -0400, Rodrigo Vivi wrote:
> > > > > > The pci state was saved, but not restored. Restore
> > > > > > right after the power state transition request like
> > > > > > every other driver.
> > > > > >
> > > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_pci.c | 2 ++
> > > > > > 1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > > > > > index 5ba4ec229494..6d29ef4b396f 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > > > > @@ -949,6 +949,8 @@ static int xe_pci_resume(struct device *dev)
> > > > > > if (err)
> > > > > > return err;
> > > > > >
> > > > > > + pci_restore_state(pdev);
> > > > >
> > > > > Why is xe even doing this stuff by hand instead of letting
> > > > > the pci core handle it?
> > > >
> > > > That's a fair question, given that there's not much documentation
> > > > around it.
> > > >
> > > > Looking the pci code, it looks that the pci core is not calling itself
> > > > for the restoration of the config space anywhere and looking to
> > > > other drivers around it looks like a safe thing to do.
> > > >
> > > > And the pci_restore_state is paired with the pci_save_state.
> > > > Both i915 and Xe are doing the pci_save_state and not restoring
> > > > it.
> > >
> > > i915 needs it because (as a side effect) it prevents the pci
> > > code from automagically sticking the device into D3, which
> > > apparently breaks hibernation on some old crappy laptops.
> > > But xe shouldn't need that.
> >
> > Hmm, doing some archaeology here, it looks like the
> > both pci_save and pci_restore were added together on
> > regular system suspend-resume by Jesse from the very
> > beginning:
> >
> > ba8bbcf6ff46 ("i915: add suspend/resume support")
>
> Pretty sure it was initially just cargo culted. Or perhaps
> the pci code didn't do stuff back then. Shrug.
>
> > Then, later pci_restore was removed by Zhenyu on
> > b7e53aba2f0e ("drm/i915: remove restore in resume")
> > because it was hanging some platforms.
> >
> > The only reference to d3 related issues that I could find
> > was this one:
> > https://lore.kernel.org/intel-gfx/1497281047-25204-5-git-send-email-animesh.manna@intel.com/
> >
> > but that was trying to add the support to the the save/restore
> > in the runtime pm side and not here in the regular system suspend/resume.
> >
> > Am I missing anything?
>
> commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during
> hibernation")
but this is about the pci_set_power_state not the pci_save_state
or pci_restore_state.
For the set_power we are pairing them together.
My concern is that for the save restore we are not.
So we either remove the save or we add the restore.
Pending more to remove it after Anshuman showed the log.
>
> > Empirically Anshuman showed us that PCI subsystem is indeed taking
> > care of the save/restore.
> >
> > Ville, my question to you now is: can I go ahead and simply remove
> > the pci_save_state() call from i915? Or you still believe some
> > hibernation somewhere could be broken?
>
> Unless someone can figure out a way to fix those cursed
> BIOSes (or they magically fixed themselves in the meantime)
> it needs to stay.
>
> > I believe we should either remove both save and restore for both
> > drivers or add both to both.
>
> I think we should try to get as close to the standard
> driver/pci behaviour as possible. AFAICS that would be
> achieved by moving pci_save_state()+pci_set_power()
> (and nothing else) into the .suspend_noirq() and
> .poweroff_noirq() hooks. And then xe wouldn't even
> need to hook those up.
yeap, but our state machinery was never good with that.
>
> But that does require some actual thougha as it would
> change our current behaviour to not go to D3 in
> .freeze_late() (the pci code won't put the device into
> D3 in .freeze_noirq() either). I suppose this would
> also let us nuke the pci_set_power_state(D0) from
> i915_drm_resume_early()...
>
> And the switcheroo stuff would presumably need some
> changes. Just calling the noirq() stuff from the
> switcheroo suspend hook should hopefully suffice.
> Hmm, and I guess we'd need the pci_set_power_state(D0)
> for it stll in the resume path.
>
> Another thing I realized is that we never restore the
> config space in the switcheroo resume path. I suppose
> for our integrated GPUs it doesn't get clobbered in
> D3 anyway so shouldn't really matter. So we could
> technically also skip the pci_save_state() in the
> switcheroo suspend path.
yeap, not only in the switcheroo, but we are saving
but never restoring...
I have this patch that remove the save in some refactor that I'm planning:
https://github.com/rodrigovivi/linux/tree/display-pm-reconcile
>
> We could also consider quirking the hibernate vs.
> D3 stuff in drivers/pci. Would just need a new flag
> on the pci_dev to skip the pci_set_power_state(),
> or something.
>
> --
> Ville Syrjälä
> Intel
More information about the Intel-xe
mailing list