[Intel-gfx] [PATCH v2] drm/i915/execlists: Skip forcewake for ELSP submission

Chris Wilson chris at chris-wilson.co.uk
Mon Jan 22 17:17:22 UTC 2018


Quoting Tvrtko Ursulin (2018-01-22 16:59:21)
> 
> On 22/01/2018 10:07, Chris Wilson wrote:
> > Now that we can read the CSB from the HWSP, we may avoid having to
> > perform mmio reads entirely and so forgo the rigmarole of the forcewake
> > dance.
> > 
> > v2: Include forcewake hint for GEM_TRACE readback of mmio. If we don't
> > hold fw ourselves, the reads may return garbage.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++------
> >   1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index ff25f209d0a5..075e7f56e9ba 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -778,6 +778,7 @@ static void execlists_submission_tasklet(unsigned long data)
> >       struct intel_engine_execlists * const execlists = &engine->execlists;
> >       struct execlist_port * const port = execlists->port;
> >       struct drm_i915_private *dev_priv = engine->i915;
> > +     bool fw = false;
> >   
> >       /* We can skip acquiring intel_runtime_pm_get() here as it was taken
> >        * on our behalf by the request (see i915_gem_mark_busy()) and it will
> > @@ -788,8 +789,6 @@ static void execlists_submission_tasklet(unsigned long data)
> >        */
> >       GEM_BUG_ON(!dev_priv->gt.awake);
> >   
> > -     intel_uncore_forcewake_get(dev_priv, execlists->fw_domains);
> > -
> >       /* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> >        * imposing the cost of a locked atomic transaction when submitting a
> >        * new request (outside of the context-switch interrupt).
> > @@ -818,6 +817,12 @@ static void execlists_submission_tasklet(unsigned long data)
> >                */
> >               __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >               if (unlikely(execlists->csb_head == -1)) { /* following a reset */
> > +                     if (!fw) {
> > +                             intel_uncore_forcewake_get(dev_priv,
> > +                                                        execlists->fw_domains);
> > +                             fw = true;
> > +                     }
> > +
> >                       head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> >                       tail = GEN8_CSB_WRITE_PTR(head);
> >                       head = GEN8_CSB_READ_PTR(head);
> > @@ -830,10 +835,10 @@ static void execlists_submission_tasklet(unsigned long data)
> >                       head = execlists->csb_head;
> >                       tail = READ_ONCE(buf[write_idx]);
> >               }
> > -             GEM_TRACE("%s cs-irq head=%d [%d], tail=%d [%d]\n",
> > +             GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
> >                         engine->name,
> > -                       head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))),
> > -                       tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))));
> > +                       head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
> > +                       tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
> 
> Not useful to log the correct value in any case? Or in other words, this 
> trace is not useful for debugging potential problems with the HWSP CSB 
> copy, while with this change it stops being so?

It also completely invalidates CI's testing as otherwise we will always
be forcewaking on submission. I am not shedding a tear over the loss of
that information on the occasions it is garbage - it too is likely to
cause heisenbugs and mask the very information we are trying to present
(in this case latency between HWSP write and CS interrupt).

> >               while (head != tail) {
> >                       struct drm_i915_gem_request *rq;
> > @@ -943,7 +948,8 @@ static void execlists_submission_tasklet(unsigned long data)
> 
> There is a writel a bit above which now has no fw in HWSP mode.

And it doesn't need one, afaict. There's no imposed wakeup (no forcewake)
and no latency, ergo the HW is pulling the right value immediate.
Otherwise the mostly idle submission of new requests would go entirely
unnoticed. (I've always had a wonder about why they would add RING_TAIL
to the shadow regs but not RING_ELSP.)
-Chris


More information about the Intel-gfx mailing list