[PATCH] drm/xe: Restore pci state upon resume

Ville Syrjälä ville.syrjala at linux.intel.com
Tue Sep 17 21:09:40 UTC 2024


On Tue, Sep 17, 2024 at 02:49:37PM -0400, Rodrigo Vivi wrote:
> On Fri, Sep 13, 2024 at 07:54:34PM +0300, Ville Syrjälä wrote:
> > On Fri, Sep 13, 2024 at 11:43:52AM -0400, Rodrigo Vivi wrote:
> > > On Fri, Sep 13, 2024 at 02:01:49PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Sep 12, 2024 at 03:05:30PM -0400, Rodrigo Vivi wrote:
> > > > > The pci state was saved, but not restored. Restore
> > > > > right after the power state transition request like
> > > > > every other driver.
> > > > > 
> > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_pci.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > > > > index 5ba4ec229494..6d29ef4b396f 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > > > @@ -949,6 +949,8 @@ static int xe_pci_resume(struct device *dev)
> > > > >  	if (err)
> > > > >  		return err;
> > > > >  
> > > > > +	pci_restore_state(pdev);
> > > > 
> > > > Why is xe even doing this stuff by hand instead of letting
> > > > the pci core handle it?
> > > 
> > > That's a fair question, given that there's not much documentation
> > > around it.
> > > 
> > > Looking the pci code, it looks that the pci core is not calling itself
> > > for the restoration of the config space anywhere and looking to
> > > other drivers around it looks like a safe thing to do.
> > > 
> > > And the pci_restore_state is paired with the pci_save_state.
> > > Both i915 and Xe are doing the pci_save_state and not restoring
> > > it.
> > 
> > i915 needs it because (as a side effect) it prevents the pci
> > code from automagically sticking the device into D3, which
> > apparently breaks hibernation on some old crappy laptops.
> > But xe shouldn't need that.
> 
> Hmm, doing some archaeology here, it looks like the
> both pci_save and pci_restore were added together on
> regular system suspend-resume by Jesse from the very
> beginning:
> 
> ba8bbcf6ff46 ("i915: add suspend/resume support")

Pretty sure it was initially just cargo culted. Or perhaps 
the pci code didn't do stuff back then. Shrug.

> Then, later pci_restore was removed by Zhenyu on
> b7e53aba2f0e ("drm/i915: remove restore in resume")
> because it was hanging some platforms.
> 
> The only reference to d3 related issues that I could find
> was this one:
> https://lore.kernel.org/intel-gfx/1497281047-25204-5-git-send-email-animesh.manna@intel.com/
> 
> but that was trying to add the support to the the save/restore
> in the runtime pm side and not here in the regular system suspend/resume.
> 
> Am I missing anything?

commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during
hibernation")

> Empirically Anshuman showed us that PCI subsystem is indeed taking
> care of the save/restore.
> 
> Ville, my question to you now is: can I go ahead and simply remove
> the pci_save_state() call from i915? Or you still believe some
> hibernation somewhere could be broken?

Unless someone can figure out a way to fix those cursed 
BIOSes (or they magically fixed themselves in the meantime)
it needs to stay.

> I believe we should either remove both save and restore for both
> drivers or add both to both.

I think we should try to get as close to the standard 
driver/pci behaviour as possible. AFAICS that would be
achieved by moving pci_save_state()+pci_set_power() 
(and nothing else) into the .suspend_noirq() and 
.poweroff_noirq() hooks. And then xe wouldn't even
need to hook those up.

But that does require some actual thougha as it would
change our current behaviour to not go to D3 in
.freeze_late() (the pci code won't put the device into
D3 in .freeze_noirq() either). I suppose this would
also let us nuke the pci_set_power_state(D0) from
i915_drm_resume_early()...

And the switcheroo stuff would presumably need some
changes. Just calling the noirq() stuff from the
switcheroo suspend hook should hopefully suffice.
Hmm, and I guess we'd need the pci_set_power_state(D0)
for it stll in the resume path.

Another thing I realized is that we never restore the
config space in the switcheroo resume path. I suppose
for our integrated GPUs it doesn't get clobbered in
D3 anyway so shouldn't really matter. So we could
technically also skip the pci_save_state() in the
switcheroo suspend path.

We could also consider quirking the hibernate vs. 
D3 stuff in drivers/pci. Would just need a new flag
on the pci_dev to skip the pci_set_power_state(),
or something.

-- 
Ville Syrjälä
Intel


More information about the Intel-xe mailing list