[Intel-gfx] [PATCH 1/7] drm/i915: Disable preemption and sleeping while using the punit sideband

Ville Syrjälä ville.syrjala at linux.intel.com
Thu Jan 11 21:17:57 UTC 2018


On Thu, Jan 11, 2018 at 08:53:42PM +0000, Chris Wilson wrote:
> Quoting Ville Syrjälä (2018-01-11 20:10:45)
> > On Wed, Jan 10, 2018 at 12:55:05PM +0000, Chris Wilson wrote:
> > > While we talk to the punit over its sideband, we need to prevent the cpu
> > > from sleeping in order to prevent a potential machine hang.
> > > 
> > > Note that by itself, it appears that pm_qos_update_request (via
> > > intel_idle) doesn't provide a sufficient barrier to ensure that all core
> > > are indeed awake (out of Cstate) and that the package is awake. To do so,
> > > we need to supplement the pm_qos with a manual ping on_each_cpu.
> > > 
> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109051
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=102657
> > > References: https://bugzilla.kernel.org/show_bug.cgi?id=195255
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > > Cc: Hans de Goede <hdegoede at redhat.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c       |  6 ++++
> > >  drivers/gpu/drm/i915/i915_drv.h       |  1 +
> > >  drivers/gpu/drm/i915/intel_sideband.c | 61 ++++++++++++++++++++++++-----------
> > >  3 files changed, 50 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index 6c8da9d20c33..d4b90cc0130b 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -902,6 +902,9 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
> > >       spin_lock_init(&dev_priv->uncore.lock);
> > >  
> > >       mutex_init(&dev_priv->sb_lock);
> > > +     pm_qos_add_request(&dev_priv->sb_qos,
> > > +                        PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE);
> > > +
> > >       mutex_init(&dev_priv->modeset_restore_lock);
> > >       mutex_init(&dev_priv->av_mutex);
> > >       mutex_init(&dev_priv->wm.wm_mutex);
> > > @@ -953,6 +956,9 @@ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
> > >       intel_irq_fini(dev_priv);
> > >       i915_workqueues_cleanup(dev_priv);
> > >       i915_engines_cleanup(dev_priv);
> > > +
> > > +     pm_qos_remove_request(&dev_priv->sb_qos);
> > > +     mutex_destroy(&dev_priv->sb_lock);
> > >  }
> > >  
> > >  static int i915_mmio_setup(struct drm_i915_private *dev_priv)
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index a689396d0ff6..ff3f9effc0bb 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1887,6 +1887,7 @@ struct drm_i915_private {
> > >  
> > >       /* Sideband mailbox protection */
> > >       struct mutex sb_lock;
> > > +     struct pm_qos_request sb_qos;
> > >  
> > >       /** Cached value of IMR to avoid reads in updating the bitfield */
> > >       union {
> > > diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
> > > index 75c872bb8cc9..02bdd2e2cef6 100644
> > > --- a/drivers/gpu/drm/i915/intel_sideband.c
> > > +++ b/drivers/gpu/drm/i915/intel_sideband.c
> > > @@ -22,6 +22,8 @@
> > >   *
> > >   */
> > >  
> > > +#include <asm/iosf_mbi.h>
> > > +
> > >  #include "i915_drv.h"
> > >  #include "intel_drv.h"
> > >  
> > > @@ -39,18 +41,20 @@
> > >  /* Private register write, double-word addressing, non-posted */
> > >  #define SB_CRWRDA_NP 0x07
> > >  
> > > -static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
> > > -                        u32 port, u32 opcode, u32 addr, u32 *val)
> > > +static void ping(void *info)
> > >  {
> > > -     u32 cmd, be = 0xf, bar = 0;
> > > -     bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
> > > +}
> > >  
> > > -     cmd = (devfn << IOSF_DEVFN_SHIFT) | (opcode << IOSF_OPCODE_SHIFT) |
> > > -             (port << IOSF_PORT_SHIFT) | (be << IOSF_BYTE_ENABLES_SHIFT) |
> > > -             (bar << IOSF_BAR_SHIFT);
> > > +static int vlv_sideband_rw(struct drm_i915_private *dev_priv,
> > > +                        u32 devfn, u32 port, u32 opcode,
> > > +                        u32 addr, u32 *val)
> > > +{
> > > +     const bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
> > > +     int err;
> > >  
> > > -     WARN_ON(!mutex_is_locked(&dev_priv->sb_lock));
> > > +     lockdep_assert_held(&dev_priv->sb_lock);
> > >  
> > > +     /* Flush the previous comms, just in case it failed last time. */
> > >       if (intel_wait_for_register(dev_priv,
> > >                                   VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
> > >                                   5)) {
> > > @@ -59,22 +63,43 @@ static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
> > >               return -EAGAIN;
> > >       }
> > >  
> > > -     I915_WRITE(VLV_IOSF_ADDR, addr);
> > > -     I915_WRITE(VLV_IOSF_DATA, is_read ? 0 : *val);
> > > -     I915_WRITE(VLV_IOSF_DOORBELL_REQ, cmd);
> > > +     iosf_mbi_punit_acquire();
> > >  
> > > -     if (intel_wait_for_register(dev_priv,
> > > -                                 VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
> > > -                                 5)) {
> > > +     /*
> > > +      * Prevent the cpu from sleeping while we use this sideband, otherwise
> > > +      * the punit may cause a machine hang.
> > > +      */
> > > +     pm_qos_update_request(&dev_priv->sb_qos, 0);
> > > +     on_each_cpu(ping, NULL, 1);
> > 
> > pm_qos_update_request() doesn't wake up the cpus on its own? I wonder
> > kind of latency guarantees it can actually give without doing that.
> 
> intel_idle plugs into the update to wake up idle cpus, but it is not
> synchronous. If we don't have the on_each_cpu() here, the hangs reoccur;
> the easiest answer seems to be that the on_each_cpu() is causing a
> schedule of the RT migration thread which is enough to ensure that the
> wakeup has occurred.
>  
> > Shouldn't we check if we're actually be talking to the punit and not
> > some other unit before we do all this extra work?
> 
> Checking port? I am suspicious of the whole iosf mechanism...

I doubt iosf sb itself is busted. That would probably make the entire
soc pretty much dead. I suspect the problem is either in the punit
firmware, or something fishy is going on with power delivery and
reducing the rate at which we change the voltages and whatnot helps
keep things a bit more stable.

So yeah, I would limit this to punit only by checking the port.

> 
> Also debating whether to only apply it to j1900 or all. Is it just pure
> fluke that we can easily reproduce it on the quad core Baytrail as
> opposed to the dual core vlv or chv?

Not sure. I guess if the problem is around power delivery it might even
be board specific to some degree.

-- 
Ville Syrjälä
Intel OTC


More information about the Intel-gfx mailing list