[Intel-gfx] [PATCH 2/4] drm/i915: Fix system resume if PCI device remained enabled
Imre Deak
imre.deak at intel.com
Mon Apr 18 08:37:05 UTC 2016
On ma, 2016-04-18 at 09:24 +0100, Chris Wilson wrote:
> On Mon, Apr 18, 2016 at 11:16:34AM +0300, Imre Deak wrote:
> > On ma, 2016-04-18 at 09:06 +0100, Chris Wilson wrote:
> > > On Mon, Apr 18, 2016 at 10:04:20AM +0300, Imre Deak wrote:
> > > > During system resume we depended on pci_enable_device() also
> > > > putting the
> > > > device into PCI D0 state. This won't work if the PCI device was
> > > > already
> > > > enabled but still in D3 state. This is because
> > > > pci_enable_device()
> > > > is
> > > > refcounted and will not change the HW state if called with a
> > > > non-
> > > > zero
> > > > refcount. Leaving the device in D3 will make all subsequent
> > > > device
> > > > accesses fail.
> > > >
> > > > This didn't cause a problem most of the time, since we resumed
> > > > with
> > > > an
> > > > enable refcount of 0. But it fails at least after module reload
> > > > because
> > > > after that we also happen to leak a PCI device enable
> > > > reference:
> > > > During
> > > > probing we call drm_get_pci_dev() which will enable the PCI
> > > > device,
> > > > but
> > > > during device removal drm_put_dev() won't disable it. This is a
> > > > bug
> > > > of
> > > > its own in DRM core, but without much harm as it only leaves
> > > > the
> > > > PCI
> > > > device enabled. Fixing it is also a bit more involved, due to
> > > > DRM
> > > > mid-layering and because it affects non-i915 drivers too. The
> > > > fix
> > > > in
> > > > this patch is valid regardless of the problem in DRM core.
> > > >
> > > > CC: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > CC: stable at vger.kernel.org
> > > > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > > > ---
> > > > drivers/gpu/drm/i915/i915_drv.c | 9 ++++++++-
> > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> > > > b/drivers/gpu/drm/i915/i915_drv.c
> > > > index d550ae2..7eaa93e 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > > @@ -803,7 +803,7 @@ static int i915_drm_resume(struct
> > > > drm_device
> > > > *dev)
> > > > static int i915_drm_resume_early(struct drm_device *dev)
> > > > {
> > > > struct drm_i915_private *dev_priv = dev->dev_private;
> > > > - int ret = 0;
> > > > + int ret;
> > > >
> > > > /*
> > > > * We have a resume ordering issue with the snd-hda
> > > > driver
> > > > also
> > > > @@ -814,6 +814,13 @@ static int i915_drm_resume_early(struct
> > > > drm_device *dev)
> > > > * FIXME: This should be solved with a special hdmi
> > > > sink
> > > > device or
> > > > * similar so that power domains can be employed.
> > > > */
> > > > +
> > > > + ret = pci_set_power_state(dev->pdev, PCI_D0);
> > > > + if (ret) {
> > > > + DRM_ERROR("failed to set PCI D0 power state
> > > > (%d)\n", ret);
> > > > + goto out;
> > > > + }
> > >
> > > The device should be enabled first, otherwise we are not meant to
> > > be
> > > touching its IO space at all (such as twiddling power state). At
> > > least
> > > that is the order pci_enable_device() uses.
> >
> > It's not MMIO or (port) IO but only a PCI config space access
> > that pci_set_power_state() requires, so doesn't need the enabling
> > of PCI resources. AFAICS pci_enable_device() enables power as the
> > first
> > thing.
> >
> > > Either way, upon failure we should be unwinding.
> >
> > I'd rather wouldn't put back the device to D3 state, as further
> > device
> > access may still possible even though resume failed.
>
> On the other hand, if you order it thusly:
>
> if (!pci_enable_device())
> return -EIO;
>
> ret = pci_set_power_state()
> if (ret < 0) {
> pci_disable_device()
> return ret;
> }
>
> it doesn't raise as many eyebrows :)
I don't know. I guess you mean that enabling the device first seems
like the first logical step. To me enabling power is the first logical
step and enabling the device itself is the second.
In any case if we decide to change this, I would suggest doing it
separately since then we also need to change the disabling order during
suspend.
--Imre
More information about the Intel-gfx
mailing list