[Intel-gfx] [PATCH v4] drm/i915/execlists: Read the context-status buffer from the HWSP

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 18 14:56:00 UTC 2017


Quoting Mika Kuoppala (2017-07-18 15:36:46)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > The engine provides a mirror of the CSB in the HWSP. If we use the
> > cacheable reads from the HWSP, we can shave off a few mmio reads per
> > context-switch interrupt (which are quite frequent!). Just removing a
> > couple of mmio is not enough to actually reduce any latency, but a small
> > reduction in overall cpu usage.
> >
> > Much appreciation for Ben dropping the bombshell that the CSB was in the
> > HWSP and for Michel in digging out the details.
> >
> > v2: Don't be lazy, add the defines for the indices.
> > v3: Include the HWSP in debugfs/i915_engine_info
> > v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> > v5: Fixup GVT-g mmio path
> >
> > Suggested-by: Ben Widawsky <benjamin.widawsky at intel.com>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> > Cc: Zhenyu Wang <zhenyuw at linux.intel.com>
> > Cc: Zhi Wang <zhi.a.wang at intel.com>
> > Acked-by: Michel Thierry <michel.thierry at intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
> >  drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
> >  3 files changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 620c9218d1c1..5fd01c14a3ec 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
> >                          upper_32_bits(addr), lower_32_bits(addr));
> >  
> >               if (i915.enable_execlists) {
> > +                     const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> >                       u32 ptr, read, write;
> >                       unsigned int idx;
> >  
> > @@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
> >                               write += GEN8_CSB_ENTRIES;
> >                       while (read < write) {
> >                               idx = ++read % GEN8_CSB_ENTRIES;
> > -                             seq_printf(m, "    Execlist CSB[%d]: 0x%08x, context: %d\n",
> > +                             seq_printf(m, "    Execlist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
> >                                          idx,
> >                                          I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> > -                                        I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> > +                                        hws[idx * 2],
> > +                                        I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> > +                                        hws[idx * 2 + 1]);
> >                       }
> >  
> >                       rcu_read_lock();
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 3469badedbe0..41dc04eb6097 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
> >       while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
> >               u32 __iomem *csb_mmio =
> >                       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> > -             u32 __iomem *buf =
> > -                     dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> > +             /* The HWSP contains a (cacheable) mirror of the CSB */
> > +             const u32 *buf =
> > +                     &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 
> Could be also const u32 * const buf = ...
> as in debugfs counterpart. Value added is quite thin tho vs clutter so
> not insisting.

> >               unsigned int head, tail;
> >  
> > +             /* However GVT emulation depends upon intercepting CSB mmio */
> > +             if (unlikely(intel_vgpu_active(dev_priv))) {
> > +                     buf = (u32 * __force)
> > +                             (dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> > +             }

Hence why we can't use const u32 *const buf ;-)

> > +
> >               /* The write will be ordered by the uncached read (itself
> >                * a memory barrier), so we do not need another in the form
> >                * of a locked instruction. The race between the interrupt
> > @@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
> >                        * status notifier.
> >                        */
> >  
> > -                     status = readl(buf + 2 * head);
> > +                     status = buf[2 * head];
> >                       if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
> >                               continue;
> >  
> >                       /* Check the context/desc id for this event matches */
> > -                     GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> > -                                      port->context_id);
> > +                     GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
> 
> In here I wonder if GEM_BUG_ON check with the equivalence of the hswp value
> vs mmio valu would serve any purpose. Adding the mmio delay here tho
> would be harmful.

Hard here as the hwsp equivalence isn't guaranteed due to vgpu. Our
sanity checks are already pretty good for confirming that the CSB
sequence matches our input.
-Chris


More information about the Intel-gfx mailing list