[Intel-gfx] [PATCH] drm/i915/display: Fix warning callstack for imbalance wakeref

Golani, Mitulkumar Ajitkumar mitulkumar.ajitkumar.golani at intel.com
Mon Aug 29 06:45:53 UTC 2022


Hi Imre,

> -----Original Message-----
> From: Deak, Imre <imre.deak at intel.com>
> Sent: 25 August 2022 16:22
> To: Golani, Mitulkumar Ajitkumar <mitulkumar.ajitkumar.golani at intel.com>
> Cc: intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Fix warning callstack for
> imbalance wakeref
> 
> On Tue, Aug 23, 2022 at 03:56:56PM +0300, Golani, Mitulkumar Ajitkumar
> wrote:
> > > Hi Imre,
> > >
> > > > On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> > > > > While executing i915_selftest, wakeref imbalance warning is seen
> > > > > with i915_selftest failure.
> > > > >
> > > > > When device is already suspended, wakeref is acquired by
> > > > > disable_rpm_wakeref_asserts and rpm ownership is transferred
> > > > > back to core. During this case wakeref_count will not be zero.
> > > > > Once driver is unregistered, this wakeref is released with
> > > > > enable_rpm_wakeref_asserts and balancing wakeref_count acquired
> > > > > by driver.
> > > > >
> > > > > This patch will fix the warning callstack by adding check if
> > > > > device is already suspended and rpm ownership transfer is going on.
> > > > >
> > > > > Signed-off-by: Mitul Golani
> > > > > <mitulkumar.ajitkumar.golani at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
> > > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > > > b/drivers/gpu/drm/i915/i915_driver.c
> > > > > index deb8a8b76965..6530a8680cfd 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > > @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct
> > > > > device
> > > > > *kdev)
> > > > >
> > > > >   drm_dbg(&dev_priv->drm, "Resuming device\n");
> > > > >
> > > > > - drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > > >wakeref_count));
> > > > > + /*
> > > > > +  * When device is already suspended, Wakeref is acquired by
> > > > disable_rpm_wakeref_asserts
> > > > > +  * and rpm ownership is transferred back to core. During this
> > > > > + case
> > > > wakeref_count will
> > > > > +  * not be zero. Once driver is unregistered, this wakeref is
> > > > > +released
> > > > with
> > > > > +  * enable_rpm_wakeref_asserts and balancing wakeref_count
> > > > acquired by driver.
> > > > > +  */
> > > > > + drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > > >wakeref_count) &&
> > > > > +!rpm->suspended);
> > > >
> > > > I can't see how disable/enable_rpm_wakeref_asserts() can lead to
> > > > this WARN. They are always called in pairs both in
> > > > intel_runtime_suspend() and intel_runtime_resume(), leaving rpm-
> >wakeref_count unchanged.
> > > >
> > > > The root cause is probably somewhere else, incrementing
> > > > rpm->wakeref_count without runtime resuming the device.
> > > >
> > > > The WARN() condition is corret, we shouldn't get here with a
> > > > non-zero wakeref_count. rpm->suspended - set in
> > > > intel_runtime_suspend() and cleared in intel_runtime_resume() -
> > > > should be always false here, so the above change would just disable the
> WARN in all cases.
> > > >
> > > Yes, in case of DG2, after device is suspended, i915_driver_remove
> > > is being called.  Here driver is taking wakeref with
> > > disable_rpm_wakeref_asserts when device was not resumed.
> 
> > >
> > > As per logs,
> > > [  395.872971] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> > > Suspending device ...
> > > [  403.553235] i915_driver_remove: START wakeref=0 [  403.553288]
> > > i915_driver_remove: before unregister i915 wakeref=65537 (Wakeref
> > > Taken) [  403.566086] i915 0000:03:00.0: [drm:intel_runtime_resume
> > > [i915]] Resuming device (Later Resuming Device)
> > >
> > > Pushed new change with :
> > > https://patchwork.freedesktop.org/series/107211/#rev5
> > >
> > Also when compared DG2 logs with ADLP working logs, Already 1 wakeref
> > was taken by DMC firmware(i915/adlp_dmc_ver2_16.bin (v2.16)), in-case
> of DG2 looks to be missing.
> > To support other targets and to prevent consecutive resuming device
> > added following check, if (i915->runtime_pm.suspended &&
> > !atomic_read(&i915->runtime_pm.wakeref_count))
> >
> > ADLP Logs:
> > ---------------
> > [   99.502434] i915_driver_probe: START wakeref=0
> > [  713.979074] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> > i915/adlp_dmc_ver2_16.bin (v2.16) [  102.455766] i915_driver_probe:
> > END wakeref=65538 ...
> > [  103.448570] i915_driver_remove: START wakeref=65537 [  103.448587]
> > i915_driver_remove: before unregister i915 wakeref=131074 ->
> > (disable_rpm_wakeref_assert) [  103.585886] i915_driver_remove: END
> > wakeref=0
> >
> > DG2 Logs:
> > -------------
> > [ 1271.704314] i915_driver_probe: START wakeref=0 [  383.050984] i915
> > 0000:03:00.0: [drm] Finished loading DMC firmware
> > i915/dg2_dmc_ver2_07.bin (v2.7) [ 1272.021133] i915_driver_probe: END
> > wakeref=1 ...
> > [  395.883531] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> > Device suspended ...
> > [ 1291.450841] i915_driver_remove: START wakeref=0 [ 1291.450877]
> > i915_driver_remove: before unregister i915 wakeref=65537 ->
> > (disable_rpm_wakeref_assert) [ 1291.603281] i915_driver_remove: END
> > wakeref=0
> 
> Still not sure what's going. Both i915_pci_probe() and
> i915_pci_remove()->i915_driver_remove() is called with a runtime PM
> reference - taken at local_pci_probe() and pci_device_remove() - and so the
> device should be runtime resumed at those points.
> 

Yes reference is being taken at local_pci_probe() and pci_device_remove() but 
During i915_selftest at perf, it is loading and unloading i915_pci_probe() and
i915_pci_remove(), here pci_device_remove() is not being called, that's why
runtime PM reference is not present during i915_driver_remove().

> > > > >   disable_rpm_wakeref_asserts(rpm);
> > > > >
> > > > >   intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > > > > --
> > > > > 2.25.1
> > > > >


More information about the Intel-gfx mailing list