[Intel-gfx] [PATCH v2] drm/i915/execlists: Skip forcewake for ELSP submission

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Jan 22 17:29:53 UTC 2018


On 22/01/2018 17:17, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-22 16:59:21)
>>
>> On 22/01/2018 10:07, Chris Wilson wrote:
>>> Now that we can read the CSB from the HWSP, we may avoid having to
>>> perform mmio reads entirely and so forgo the rigmarole of the forcewake
>>> dance.
>>>
>>> v2: Include forcewake hint for GEM_TRACE readback of mmio. If we don't
>>> hold fw ourselves, the reads may return garbage.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++------
>>>    1 file changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index ff25f209d0a5..075e7f56e9ba 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -778,6 +778,7 @@ static void execlists_submission_tasklet(unsigned long data)
>>>        struct intel_engine_execlists * const execlists = &engine->execlists;
>>>        struct execlist_port * const port = execlists->port;
>>>        struct drm_i915_private *dev_priv = engine->i915;
>>> +     bool fw = false;
>>>    
>>>        /* We can skip acquiring intel_runtime_pm_get() here as it was taken
>>>         * on our behalf by the request (see i915_gem_mark_busy()) and it will
>>> @@ -788,8 +789,6 @@ static void execlists_submission_tasklet(unsigned long data)
>>>         */
>>>        GEM_BUG_ON(!dev_priv->gt.awake);
>>>    
>>> -     intel_uncore_forcewake_get(dev_priv, execlists->fw_domains);
>>> -
>>>        /* Prefer doing test_and_clear_bit() as a two stage operation to avoid
>>>         * imposing the cost of a locked atomic transaction when submitting a
>>>         * new request (outside of the context-switch interrupt).
>>> @@ -818,6 +817,12 @@ static void execlists_submission_tasklet(unsigned long data)
>>>                 */
>>>                __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>>>                if (unlikely(execlists->csb_head == -1)) { /* following a reset */
>>> +                     if (!fw) {
>>> +                             intel_uncore_forcewake_get(dev_priv,
>>> +                                                        execlists->fw_domains);
>>> +                             fw = true;
>>> +                     }
>>> +
>>>                        head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>>>                        tail = GEN8_CSB_WRITE_PTR(head);
>>>                        head = GEN8_CSB_READ_PTR(head);
>>> @@ -830,10 +835,10 @@ static void execlists_submission_tasklet(unsigned long data)
>>>                        head = execlists->csb_head;
>>>                        tail = READ_ONCE(buf[write_idx]);
>>>                }
>>> -             GEM_TRACE("%s cs-irq head=%d [%d], tail=%d [%d]\n",
>>> +             GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
>>>                          engine->name,
>>> -                       head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))),
>>> -                       tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))));
>>> +                       head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
>>> +                       tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
>>
>> Not useful to log the correct value in any case? Or in other words, this
>> trace is not useful for debugging potential problems with the HWSP CSB
>> copy, while with this change it stops being so?
> 
> It also completely invalidates CI's testing as otherwise we will always
> be forcewaking on submission. I am not shedding a tear over the loss of
> that information on the occasions it is garbage - it too is likely to
> cause heisenbugs and mask the very information we are trying to present
> (in this case latency between HWSP write and CS interrupt).

Ookay.

>>>                while (head != tail) {
>>>                        struct drm_i915_gem_request *rq;
>>> @@ -943,7 +948,8 @@ static void execlists_submission_tasklet(unsigned long data)
>>
>> There is a writel a bit above which now has no fw in HWSP mode.
> 
> And it doesn't need one, afaict. There's no imposed wakeup (no forcewake)
> and no latency, ergo the HW is pulling the right value immediate.
> Otherwise the mostly idle submission of new requests would go entirely
> unnoticed. (I've always had a wonder about why they would add RING_TAIL
> to the shadow regs but not RING_ELSP.)

Ha shadowed, ok.. GEM_BUG_ON(intel_reg_forcewake_for_read(...))? But not 
in the tasklet, perhaps somewhere else.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko


More information about the Intel-gfx mailing list